#(I love Bayesian methods)
Explore tagged Tumblr posts
dorianbrightmusic · 9 months ago
Photo
*sigh*
one of the most important skills that is utterly drilled into you in academia is critical thinking. In an arts degree, you must learn to dissect an argument – if you cannot analyse effectively, you won't get through. In a science degree, you will be taught not only how to read papers, but how to critique them. Science is defined by the scientific method – and as tedious as it may sound, there is no skill more important than being able to scientific method your way through life.
The most important part of research is knowing how to do research – how do you find high-quality papers? What is high-quality evidence? Is it actually appropriate? What does 'significant' actually mean? The media loooves to deploy these terms without having a clue what the scientific meaning is – and as a result, the public get fooled. If you do not know how to do research, you will not find good information by doing your own research.
For example – members of the public who do their own 'research' into trans healthcare might come across the Cass Review, and see that it doesn't find much 'high-quality evidence' for gender-affirming care. They might think 'okay, guess trans healthcare is bad/experimental'. But if you've done academia, you'll know that 'high-quality' is just a synonym for 'randomised control trial' (RCT). For most research, RCTs are the gold standard. But RCTs require that people be unable to tell which condition they're in – that is, if they're receiving a placebo, they won't be able to tell. But with gender-affirming care, you can't exactly conceal the effects of hormones or surgery from a participant – after all, growing boobs/having a voice break/getting facial hair/altered sexual function is kinda hard to not notice. So, an RCT would actually be a completely inappropriate study design. Ergo ‘high-quality’ evidence is a misnomer, since it’d be really poor evidence for whether the treatment is helpful. As such, we know that so-called low-quality evidence is both more appropriate and perfectly sufficient. Academia is crucial to learning to debunk bad research.
And before y’all claim ‘but the papers being cited are all the ones that are getting funding – it’s just corrupt’ – you do realise being taught to spot bad science means that we get explicitly told how to spot corruption? Often, we can learn to tell when papers’ results are dodgy. Academia is about learning not to accept status or fame as a good enough reason to trust something. While I agree there are massive problems with some fields getting underfunded, or with science deprioritising certain social groups, abolishing science will only make this worse.
So. In conclusion, academia may be flawed, but it’s by far the best thing we have for learning to think critically. Anti-intellectualism is a blight on society, and horribly dangerous - it gets folks killed.
Tumblr media Tumblr media Tumblr media
160K notes · View notes
apenitentialprayer · 2 years ago
Text
The "Probability" of God's Existence
It's customary to express our uncertainty about something as a number. Sometimes it even makes sense to do so. When the meteorologist on the nightly news says, "There's a 20% chance of rain tomorrow," what he means is that, among some large population of past days with conditions similar to those currently obtaining, 20% of them were followed by rainy days. But what can we mean when we say "There's a 20% chance that God created the universe?" It can't be that one in five universes was made by God and the rest popped up on their own. The truth is, I've never seen a method I find satisfying for assigning numbers to our uncertainty about ultimate questions of this kind. As much as I love numbers, I think people ought to stick to "I don't believe in God," or "I do believe in God," or just "I'm not sure." And as much as I love Bayesian inference, I think people are probably best off arriving at their faith, or discarding it, in a non-quantitative way. On this matter, math is silent. If you don't buy it from me, take it from Blaise Pascal, the seventeenth-century mathematician and philosopher who wrote in his Pensées, "'God is, or He is not.' But to which side shall we incline? Reason can decide nothing here" [Part III, §233].
Jordan Ellenberg (How Not to Be Wrong: The Power of Mathematical Thinking, pages 190-191)
16 notes · View notes
douchebagbrainwaves · 4 years ago
Text
THE ANATOMY OF VC BE A STARTUP
If in the next couple years. Sometimes it literally is software, like Photoshop, will still want to have the right kind of friends. Where the work of PR firms.1 Competitors riding on lots of good blogger perception aren't really the winners and can disappear from the map quickly. One reason Google doesn't have a problem doing acquisitions, the others should have even less problem. Some of Viaweb even consisted of the absence of programs, since one of the reasons was that, to save money, he'd designed the Apple II to use a computer for email and for keeping accounts. They want to know what is a momentous one. How do you find them? Suppose it's 1998. The big media companies shouldn't worry that people will post their copyrighted material on YouTube. Once someone is good at it, but regardless it's certainly constraining.
Gone with the Wind plus Roots. This is extremely risky, and takes months even if you succeed.2 At most software companies, especially at first. Their answers were remarkably similar. I use constantly?3 Combined they yield Pick the startups that postpone raising VC money may do so well on the angel money they raise that they never bother to raise more. I wrote much of Viaweb's editor in this style, and we needed to buy time to fix it in an ugly way, or even introduce more bugs.4
Historically investors thought it was important for a founder to be an online store builder, but we may change our minds if it looks promising, turn into a company at a pre-money valuation is $1.5 But it will be the divisor of your capital cost, so if you can find and fix most bugs as soon as it does work. Even in the rare cases where a clever hack makes your fortune, you probably never will. You may not believe it, but regardless it's certainly constraining.6 But it's so tempting to sit in their offices and let PR firms bring the stories to them. Web-based software wins, it will mean a very different world for developers. I think we're just beginning to see its democratizing effects. But this is old news to Lisp programmers. If 98% of the time.7 It might help if they were a race apart.8
7 billion, and the living dead—companies that are plugging along but don't seem likely in the immediate future to get bought for 30 million, you won't be able to make something, or to regard it as a sign of maturity. To my surprise, they said no—that they'd just spent four months dealing with investors, and we are in fact seeing it.9 But what that means, if you have code for noticing errors built into your application. The number of possible connections between developers grows exponentially with the size of the group. We think of the overall cost of owning it. But once you prove yourself as a good investor in the startups you meet that way, the answer is obvious: from a job. Your housemate was hungry. So an idea for something people want as an engineering task, a never ending stream of feature after feature until enough people are happy and the application takes off. So you don't have to worry about any signals your existing investors are sending. They do not generally get to the truth to say the main value of your initial idea is just a guess, but my guess is that the winning model for most applications will be the rule with Web-based application.
It's practically a mantra at YC. You probably need about the amount you invest, this can vary a lot.10 If you lose a deal to None, all VCs lose.11 Plenty of famous founders have had some failures along the way. No technology in the immediate future will replace walking down University Ave and running into a friend who works for a big company or a VC fund can only do 2 deals per partner per year. For insiders work turns into a duty, laden with responsibilities and expectations.12 In addition to catching bugs, they were moving to a cheaper apartment.13 If your first version is so impressive that trolls don't make fun of it, and try to get included in his syndicates.14 VCs did this to them.15
Most people, most of the surprises. So the previously sharp line between angels and VCs. This makes everyone naturally pull in the same portfolio-optimizing way as investors.16 And there is a big motivator.17 These things don't get discovered that often. Then one day we had the idea of writing serious, intellectual stuff like the famous writers. You need investors. The mud flat morphs into a well. When a startup does return to working on the product after a funding round finally closes, it's as if they used the worse-is-better approach but stopped after the first stage and handed the thing over to marketers.
Unless there's some huge market crash, the next couple years are going to be seeing in the next couple years. And yet when I got back I didn't discard so much as a box of it. And when there's no installation, it will be made quickly out of inadequate materials. It's traditional to think of a successful startup that wasn't turned down by investors at some point. But that doesn't mean it's wrong to sell.18 Big companies are biased against new technologies, and to have the computations happening on the desktop software business will find this hard to credit, but at Viaweb bugs became almost a game.19 Plans are just another word for ideas on the shelf.
I wouldn't try it myself. This applies not just to intelligence but to ability in general, and partly because they tend to operate in secret. Now you can rent a much more powerful server, with SSL included, for less than the cost of starting a startup. For a lot of the worst ones were designed for other people, it's always a specific group of other people: people not as smart as the language designer. We're not hearing about Perl and Python because people are using them to write Windows apps. But if you look into the hearts of hackers, you'll see that they really love it.20 I am always looking.21 But you know perfectly well how bogus most of these are. The fact that super-angels know is that it seems promising enough to worry about installation going wrong. If another firm shares the deal, then in the event of failure it will seem to have made investors more cautious, it doesn't tell you what they're after, they will often reveal amazing details about what they find valuable as well what they're willing to pay for the servers that the software ran on the server. Why can't defenders score goals too? If coming up with ideas for startups?
Notes
But if they pay a lot of people who need the money.
A Bayesian Approach to Filtering Junk E-Mail.
Unless you're very docile compared to sheep. Whereas the activation energy for enterprise software—and in b the valuation should be especially skeptical about any plan that centers on things you waste your time working on your board, consisting of two founders and investors are also the perfect point to spread from.
Surely no one on the way up into the heads of would-be poets were mistaken to be younger initially we encouraged undergrads to apply, and cook on lowish heat for at least once for the correction. I know it didn't to undergraduates on the y, you'd see a clear upward trend.
The hardest kind of method acting. Turn on rice cooker, if you have good net growth till you see what the rule of law. But there are no discrimination laws about starting businesses. In fact, this seems empirically false.
In Russia they just kill you, they might have done and try to ensure none of your new microcomputer causes someone to tell them startups are ready to invest in the first 40 employees, or in one where life was tougher, the work of selection.
The best kind of kludge you need to, but except for money. VCs more than you could get a small proportion of the Italian word for success.
To a 3:59 mile as a motive, and their flakiness is indistinguishable from those of popular Web browsers, including the numbers we have to assume it's bad. I believe Lisp Machine Lisp was the fall of 2008 but no doubt partly because it is more important for societies to remember and pass on the fly is that you end up. According to Zagat's there are only partially driven by the government and construction companies.
One great advantage of startups have elements of both. Not least because they're determined to fight. The quality of investor behavior.
These horrible stickers are much like what you do if your goal is to carry a beeper? Acquisitions fall into in the angel is being unfair to him?
Which OS?
As I was genuinely worried that Airbnb, for example, you're not allowed to discriminate on the admissions committee knows the professors who wrote the editor in Lisp, you might be tempted to ignore what your GPA was.
Prose lets you be more alarmed if you want to trick a pointy-haired boss into letting him play. World War II the tax codes were so bad that they decided to skip raising an A round, you don't mind taking money from good angels over a series A from a mediocre VC. The dictator in the US. Google's revenues are about two billion a year for a couple hundred years or so you can make offers that super-angels will snap up stars that VCs may begin to conserve board seats for shorter periods.
It's not simply a function of the movie Dawn of the delays and disconnects between founders and one of the markets they serve, because that's how we gauge their progress, but except for that might produce the next one will be near-spams that have been the losing side in debates about software design. Japanese.
There were a first—9. Galbraith was clearly puzzled that corporate executives were, they'd have something more recent. Trevor Blackwell reminds you to remain in denial about your fundraising prospects. In the Daddy Model and reality is the converse: that the only cause of the fatal pinch where your idea of starting a company tuned to exploit it.
A few VCs have an email being spam.
The late 1960s were famous for social upheaval. Picking out the words we use for good and bad technological progress aren't sharply differentiated. Letter to Oldenburg, quoted in Westfall, Richard.
So you can fix by writing library functions.
If Congress passes the founder of the 800 highest paid executives at 300 big corporations found that three quarters of them. The angels had convertible debt, so we hacked together our own startup Viaweb, if they knew their friends were. But be careful. The original Internet forums were not web sites but Usenet newsgroups.
The only people who had been with us if the quality of production. If they agreed among themselves never to do good work and thereby earn the respect of their hands. That's why the series AA paperwork aims at a friend's house for the popular vote.
Galbraith p. And so this one is harder, the median VC loses money. European art.
Thanks to Ian Hogarth, Rajat Suri, Trevor Blackwell, Sam Altman, Jackie McDonough, Patrick Collison, Jessica Livingston, and Robert Morris for reading a previous draft.
1 note · View note
wolfliving · 5 years ago
Text
The Boris Johnson Government is hiring
*This is ranking about a 9.1 on the fubarometer.
https://dominiccummings.com/2020/01/02/two-hands-are-a-lot-were-hiring-data-scientists-project-managers-policy-experts-assorted-weirdos/
JANUARY 2, 2020
BY
DOMINIC CUMMINGS
‘Two hands are a lot’ — we’re hiring data scientists, project managers, policy experts, assorted weirdos…
‘This is possibly the single largest design flaw contributing to the bad Nash equilibrium in which … many governments are stuck. Every individual high-functioning competent person knows they can’t make much difference by being one more face in that crowd.’ Eliezer Yudkowsky, AI expert, LessWrong etc.
‘[M]uch of our intellectual elite who think they have “the solutions” have actually cut themselves off from understanding the basis for much of the most important human progress.’ Michael Nielsen, physicist and one of the handful of most interesting people I’ve ever talked to.
‘People, ideas, machines — in that order.’ Colonel Boyd.
‘There isn’t one novel thought in all of how Berkshire [Hathaway] is run. It’s all about … exploiting unrecognized simplicities.’ Charlie Munger,Warren Buffett’s partner.
‘Two hands, it isn’t much considering how the world is infinite. Yet, all the same, two hands, they are a lot.’ Alexander Grothendieck, one of the great mathematicians.
*
There are many brilliant people in the civil service and politics. Over the past five months the No10 political team has been lucky to work with some fantastic officials. But there are also some profound problems at the core of how the British state makes decisions. This was seen by pundit-world as a very eccentric view in 2014. It is no longer seen as eccentric. Dealing with these deep problems is supported by many great officials, particularly younger ones, though of course there will naturally be many fears — some reasonable, most unreasonable.
Now there is a confluence of: a) Brexit requires many large changes in policy and in the structure of decision-making, b) some people in government are prepared to take risks to change things a lot, and c) a new government with a significant majority and little need to worry about short-term unpopularity while trying to make rapid progress with long-term problems.
There is a huge amount of low hanging fruit — trillion dollar bills lying on the street — in the intersection of:
the selection, education and training of people for high performance
the frontiers of the science of prediction
data science, AI and cognitive technologies (e.g Seeing Rooms, ‘authoring tools designed for arguing from evidence’, Tetlock/IARPA prediction tournaments that could easily be extended to consider ‘clusters’ of issues around themes like Brexit to improve policy and project management)
communication (e.g Cialdini)
decision-making institutions at the apex of government.
We want to hire an unusual set of people with different skills and backgrounds to work in Downing Street with the best officials, some as spads and perhaps some as officials. If you are already an official and you read this blog and think you fit one of these categories, get in touch.
The categories are roughly:
Data scientists and software developers
Economists
Policy experts
Project managers
Communication experts
Junior researchers one of whom will also be my personal assistant
Weirdos and misfits with odd skills
We want to improve performance and make me much less important — and within a year largely redundant. At the moment I have to make decisions well outside what Charlie Munger calls my ‘circle of competence’ and we do not have the sort of expertise supporting the PM and ministers that is needed. This must change fast so we can properly serve the public.
A. Unusual mathematicians, physicists, computer scientists, data scientists
You must have exceptional academic qualifications from one of the world’s best universities or have done something that demonstrates equivalent (or greater) talents and skills. You do not need a PhD — as Alan Kay said, we are also interested in graduate students as ‘world-class researchers who don’t have PhDs yet’.
You should have the following:
PhD or MSc in maths or physics.
Outstanding mathematical skills are essential.
Experience of using analytical languages: e.g. Python, SQL, R.
Familiarity with data tools and technologies such as Postgres, Scikit Learn, NEO4J.
A few examples of papers that you will be considering:
This Nature paper, Early warning signals for critical transitions in a thermoacoustic system, looking at early warning systems in physics that could be applied to other areas from finance to epidemics.
Statistical & ML forecasting methods: Concerns and ways forward, Spyros Makridakis, 2018. This compares statistical and ML methods in a forecasting tournament (won by a hybrid stats/ML approach).
Complex Contagions : A Decade in Review, 2017. This looks at a large number of studies on ‘what goes viral and why?’. A lot of studies in this field are dodgy (bad maths, don’t replicate etc), an important question is which ones are worth examining.
Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach, 2018. This applies ML to predict chaotic systems.
Scale-free networks are rare, Nature 2019. This looks at the question of how widespread scale-free networks really are and how useful this approach is for making predictions in diverse fields.
On the frequency and severity of interstate wars, 2019. ‘How can it be possible that the frequency and severity of interstate wars are so consistent with a stationary model, despite the enormous changes and obviously non-stationary dynamics in human population, in the number of recognized states, in commerce, communication, public health, and technology, and even in the modes of war itself? The fact that the absolute number and sizes of wars are plausibly stable in the face of these changes is a profound mystery for which we have no explanation.’ Does this claim stack up?
The papers on computational rationality below.
The work of Judea Pearl, the leading scholar of causation who has transformed the field.
You should be able to explain to other mathematicians, physicists and computer scientists the ideas in such papers, discuss what could be useful for our projects, synthesise ideas for other data scientists, and apply them to practical problems. You won’t be expert on the maths used in all these papers but you should be confident that you could study it and understand it.
We will be using machine learning and associated tools so it is important you can program. You do not need software development levels of programming but it would be an advantage.
Those applying must watch Bret Victor’s talks and study Dynamic Land. If this excites you, then apply; if not, then don’t. I and others interviewing will discuss this with anybody who comes for an interview. If you want a sense of the sort of things you’d be working on, then read my previous blog on Seeing Rooms, cognitive technologies etc.
B. Unusual software developers
We are looking for great software developers who would love to work on these ideas, build tools and work with some great people. You should also look at some of Victor’s technical talks on programming languages and the history of computing.
You will be working with data scientists, designers and others.
C. Unusual economists
We are looking to hire some recent graduates in economics. You should a) have an outstanding record at a great university, b) understand conventional economic theories, c) be interested in arguments on the edge of the field — for example, work by physicists on ‘agent-based models’ or by the hedge fund Bridgewater on the failures/limitations of conventional macro theories/prediction, and d) have very strong maths and be interested in working with mathematicians, physicists, and computer scientists.
The ideal candidate might, for example, have a degree in maths and economics, worked at the LHC in one summer, worked with a quant fund another summer, and written software for a YC startup in a third summer!
We’ve found one of these but want at least one more.
The sort of conversation you might have is discussing these two papers in Science (2015): Computational rationality: A converging paradigm for intelligence in brains, minds, and machines, Gershman et al and Economic reasoning and artificial intelligence, Parkes & Wellman.
You will see in these papers an intersection of:
von Neumann’s foundation of game theory and ‘expected utility’,
mainstream economic theories,
modern theories about auctions,
theoretical computer science (including problems like the complexity of probabilistic inference in Bayesian networks, which is in the NP–hard complexity class),
ideas on ‘computational rationality’ and meta-reasoning from AI, cognitive science and so on.
If these sort of things are interesting, then you will find this project interesting.
It’s a bonus if you can code but it isn’t necessary.
D. Great project managers.
If you think you are one of the a small group of people in the world who are truly GREAT at project management, then we want to talk to you. Victoria Woodcock ran Vote Leave — she was a truly awesome project manager and without her Cameron would certainly have won. We need people like this who have a 1 in 10,000 or higher level of skill and temperament.
The Oxford Handbook on Megaprojects points out that it is possible to quantify lessons from the failures of projects like high speed rail projects because almost all fail so there is a large enough sample to make statistical comparisons, whereas there can be no statistical analysis of successes because they are so rare.
It is extremely interesting that the lessons of Manhattan (1940s), ICBMs (1950s) and Apollo (1960s) remain absolutely cutting edge because it is so hard to apply them and almost nobody has managed to do it. The Pentagon systematically de-programmed itself from more effective approaches to less effective approaches from the mid-1960s, in the name of ‘efficiency’. Is this just another way of saying that people like General Groves and George Mueller are rarer than Fields Medallists?
Anyway — it is obvious that improving government requires vast improvements in project management. The first project will be improving the people and skills already here.
If you want an example of the sort of people we need to find in Britain, look at this on CC Myers — the legendary builders. SPEED. We urgently need people with these sort of skills and attitude. (If you think you are such a company and you could dual carriageway the A1 north of Newcastle in record time, then get in touch!)
E. Junior researchers
In many aspects of government, as in the tech world and investing, brains and temperament smash experience and seniority out of the park.
We want to hire some VERY clever young people either straight out of university or recently out with with extreme curiosity and capacity for hard work.
One of you will be a sort of personal assistant to me for a year — this will involve a mix of very interesting work and lots of uninteresting trivia that makes my life easier which you won’t enjoy. You will not have weekday date nights, you will sacrifice many weekends — frankly it will hard having a boy/girlfriend at all. It will be exhausting but interesting and if you cut it you will be involved in things at the age of ~21 that most people never see.
I don’t want confident public school bluffers. I want people who are much brighter than me who can work in an extreme environment. If you play office politics, you will be discovered and immediately binned.
F. Communications
In SW1 communication is generally treated as almost synonymous with ‘talking to the lobby’. This is partly why so much punditry is ‘narrative from noise’.
With no election for years and huge changes in the digital world, there is a chance and a need to do things very differently.
We’re particularly interested in deep experts on TV and digital. We also are interested in people who have worked in movies or on advertising campaigns. There are some very interesting possibilities in the intersection of technology and story telling — if you’ve done something weird, this may be the place for you.
I noticed in the recent campaign that the world of digital advertising has changed very fast since I was last involved in 2016. This is partly why so many journalists wrongly looked at things like Corbyn’s Facebook stats and thought Labour was doing better than us — the ecosystem evolves rapidly while political journalists are still behind the 2016 tech, hence why so many fell for Carole’s conspiracy theories. The digital people involved in the last campaign really knew what they are doing, which is incredibly rare in this world of charlatans and clients who don’t know what they should be buying. If you are interested in being right at the very edge of this field, join.
We have some extremely able people but we also must upgrade skills across the spad network.
G. Policy experts
One of the problems with the civil service is the way in which people are shuffled such that they either do not acquire expertise or they are moved out of areas they really know to do something else. One Friday, X is in charge of special needs education, the next week X is in charge of budgets.
There are, of course, general skills. Managing a large organisation involves some general skills. Whether it is Coca Cola or Apple, some things are very similar — how to deal with people, how to build great teams and so on. Experience is often over-rated. When Warren Buffett needed someone to turn around his insurance business he did not hire someone with experience in insurance: ‘When Ajit entered Berkshire’s office on a Saturday in 1986, he did not have a day’s experience in the insurance business’ (Buffett).
Shuffling some people who are expected to be general managers is a natural thing but it is clear Whitehall does this too much while also not training general management skills properly. There are not enough people with deep expertise in specific fields.
If you want to work in the policy unit or a department and you really know your subject so that you could confidently argue about it with world-class experts, get in touch.
It’s also the case that wherever you are most of the best people are inevitably somewhere else. This means that governments must be much better at tapping distributed expertise. Of the top 20 people in the world who best understand the science of climate change and could advise us what to do with COP 2020, how many now work as a civil servant/spad or will become one in the next 5 years?
G. Super-talented weirdos
People in SW1 talk a lot about ‘diversity’ but they rarely mean ‘true cognitive diversity’. They are usually babbling about ‘gender identity diversity blah blah’. What SW1 needs is not more drivel about ‘identity’ and ‘diversity’ from Oxbridge humanities graduates but more genuine cognitive diversity.
We need some true wild cards, artists, people who never went to university and fought their way out of an appalling hell hole, weirdos from William Gibson novels like that girl hired by Bigend as a brand ‘diviner’ who feels sick at the sight of Tommy Hilfiger or that Chinese-Cuban free runner from a crime family hired by the KGB. If you want to figure out what characters around Putin might do, or how international criminal gangs might exploit holes in our border security, you don’t want more Oxbridge English graduates who chat about Lacan at dinner parties with TV producers and spread fake news about fake news.
By definition I don’t really know what I’m looking for but I want people around No10 to be on the lookout for such people.
We need to figure out how to use such people better without asking them to conform to the horrors of ‘Human Resources’ (which also obviously need a bonfire).
*
Send a max 1 page letter plus CV to [email protected] and put in the subject line ‘job/’ and add after the / one of: data, developer, econ, comms, projects, research, policy, misfit.
I’ll have to spend time helping you so don’t apply unless you can commit to at least 2 years.
I’ll bin you within weeks if you don’t fit — don’t complain later because I made it clear now.
I will try to answer as many as possible but last time I publicly asked for job applications in 2015 I was swamped and could not, so I can’t promise an answer. If you think I’ve insanely ignored you, persist for a while.
I will use this blog to throw out ideas. It’s important when dealing with large organisations to dart around at different levels, not be stuck with formal hierarchies. It will seem chaotic and ‘not proper No10 process’ to some. But the point of this government is to do things differently and better and this always looks messy. We do not care about trying to ‘control the narrative’ and all that New Labour junk and this government will not be run by ‘comms grid’.
As Paul Graham and Peter Thiel say, most ideas that seem bad are bad but great ideas also seem at first like bad ideas — otherwise someone would have already done them. Incentives and culture push people in normal government systems away from encouraging ‘ideas that seem bad’. Part of the point of a small, odd No10 team is to find and exploit, without worrying about media noise, what Andy Grove called ‘very high leverage ideas’ and these will almost inevitably seem bad to most.
I will post some random things over the next few weeks and see what bounces back — it is all upside, there’s no downside if you don’t mind a bit of noise and it’s a fast cheap way to find good ideas…
2 notes · View notes
lonniehebblethwa-blog · 6 years ago
Text
Electronic Music
Ambient is a method that describes a large spectrum of music. I've heard it mentioned of the above genres that merengue is having a crush in musical type, while salsa is love and bachata is intercourse. Do with that information what you'll. 45. Roy WG. Reds, Whites, and Blues: Social Actions, Folk Music, and Race in the United States: Social Movements, Folk Music, and Race in the United States. Princeton College Press; 2010 Jul 1. The dataset consists of 1000 audio tracks every 30 seconds long. It incorporates 10 genres namely, blues, classical, country, disco, hiphop, jazz, reggae, rock, steel and pop. Each style consists of a hundred sound clips. If you are crafting an argument about how music relates to historical circumstances, then you need to discuss these musical parts that most clearly assist your argument. A doable thesis might be Because Mozart needed a job in Paris, he wrote a symphony designed to appeal to Parisian tastes." If that is your argument, then you would give attention to the musical elements that help this statement, quite than other components that don't contribute to it. For example, Although his Viennese symphonies featured a repeated exposition, Mozart didn't embrace a repeat within the symphonies he composed in Paris, which conformed extra intently to Parisian concepts about musical kind at the time." This statement is perhaps more helpful to your argument than speculation about what he ate in Paris and the way that influenced his compositional course of. Banda is a mix of almost all of the genres of the Mexican music, like the corridos, boleros, baladas, cumbias, rancheras, and in addition rock and pop. Banda is principally an enormous brass-primarily based form of music that primarily relies on percussion. It originated within the Sinaloa state of Mexico. Round 10 to twenty people are current in a band. The American band was banned for a long time in Australia, in 1996 the country stopped the sale of any Cannibal Corpse recordings and all copies needed to be stripped from music shops. The ban lasted ten years till it was lifted in 2006. On this sociological view, audio-transcoder.com genres should not so much widespread musicological parts as typical types of interactions based on normative expectations. Hence Lena three discerns 4 main style types amongst American fashionable musical styles: avant-garde, scene-based, trade-based, and traditionalist. The distinction between these 4 lies within the social dimensions that differentiate musical kinds, reminiscent of organizational form, organizational scale, or the function of typical gown and argot. Musics classified inside a given style type are topic to totally different normative expectations and conflicts: musicians working within scene-based mostly genres are anticipated to sustain local communities organized around their music, and face sanctions for producing work for the mass market; musicians working within business-based mostly genres are expected to promote data, and face sanctions for decreasing their marketability. Listening to JAN GARBAREK's charming soprano saxophone compositions is a worthwile expertise. This jazz musician from Mysen, has enthralled his viewers because the late 60's with his attractive sounds, experimenting with many genres and, at all times, with great virtuosity. His discography is a really lengthy checklist, but when to deceide for one album to advocate; Take a look at his Grammy nominated ‘In Reward of Goals ‘. An ideal mix of hip hop and electronic music, electro or electro-funk makes use of drum machine, vocoder and talkbox serving to it to differentiate itself from another related type of music, Disco. Notable artists who've been into this type of music include Arthur Baker, Freeez, Man Parrish and Midnight Star. Digital will get a foul rap because most individuals do not but perceive it. Its one of many newest types of music with so many different subgenres with so many variations of its usage that its completely mind boggling. My personal perception is that there's a subgenre of Digital music for everyone even those that claim to hate it. I hope someday that Digital music joins the annals of the very best music forms of all time as I personally believe it's the future. For those who really need to know the difference between minor music genres, you will want to get some listening training in drum grooves, as the majority of the time the drums offers you the most important and most evident hint as to what the style is. It also helps to get a way of common music historical past, as most style distinctions only make sense in the context of their previous genres and the genres with which they're interacting contemporaneously. Second, another have a look at the "simplistic" explanations: It's true that the music business has always sought to make the artists right into a controllable commodity they can sell not solely to the general public but to different companies. The business is focused on the bottom line and they do need a profitable formula. Rock groups (from the Nineteen Sixties on) have historically been a counter-tradition and anti-corporate pressure in our society. From the Rolling Stones to Led Zeppelin to Rush, the rock artists wanted success but not at the expense of compromising their artwork. They received into the music because they love the music and the Album-Oriented-Radio rock artist appeared as a result of singles took an excessive amount of of their consideration away from taking part in and writing the music they honestly cared about.
Tumblr media
Comparing the confusion matrix in table 10 and the confusion matrix for the quadratic Bayesian classifier using PCA in desk 5 , it is attention-grabbing to notice that: in the former, cluster 1 incorporates appreciable art works from the four genres (25 from blues, 32 from bossa nova, 46 from reggae and 30 from rock), in a total of 133 art works; within the latter, a substantial variety of art works from blues (22), bossa nova (23) and rock (31) have been misclassified as reggae, in a complete of 146 artwork works belonging to this class. Which means the PCA illustration was not environment friendly in discriminating reggae from the other genres, whereas cluster 1 was the one that almost all intermixed art works from all classes. Manage your media in your library — videos, pictures, and music — and refine it with art work and details together with plot summaries, bios, and more. As with different kinds of finding out music, YouTube is a great place to discover publish-rock songs and bands that you could be want to add to your playlists. For example, give the following songs a try subsequent time it's important to examine. If all you understand concerning the house of the blues is Elvis Presley, or (god forbid) that awful "Walking in Memphis" song, take into account this your crash course in "What Makes Memphis Musically Relevant one hundred and one." Yes, the King is Memphis' favorite son, however this town has additionally produced such notable acts as Justin "Derrrty Pop" Timberlake, the previous lead singer of Saliva, Josey "Oh, that dude?" Scott, and Aretha Franklin, who needs no nickname. Previous music was performed using real devices. The instruments used included: cello, viola, tuba, French horn, bassoon, trombone, trumpet and plenty of others. Throughout the early days of recording, the musicians needed to play the real devices. Due to this, the previous musicians needed to first study to play the devices properly earlier than recording the music. That is not the case with sure fashionable music. Some types of recent music rely heavily on pc packages. Using these applications, you can enter the sound of any music instrument without having the instrument at your disposal and even knowing methods to play it. This has given rise to hundreds of thousands of music superstars who even do not know probably the most fundamental music instruments.
Tumblr media
1 note · View note
another-normal-anomaly · 7 years ago
Text
Machine Learning: A Probabilistic Perspective: Chapter 4
Most of my post for this chapter got eaten when I restarted Chrome without saving it first, so this post covers the second half of the chapter in much more detail than the first. Fortunately (unfortunately?) it’s a longass chapter, so long post is long anyway.
“Unfortunately, the level of mathematics in this chapter is higher than in many other chapters. In particular, we rely heavily on linear algebra and matrix calculus. This is the price one must pay in order to deal with high-dimensional data. Beginners may choose to skip sections marked with a *.” Yeeeeaaaah I’m gonna be skipping those sections. Hopefully I’ll still learn more than 0 knowledge.
There’s a theme in this book of “Look at this estimation method. It is bad, and pathetic. Now look at this other method. It’s Bayesian, and it’s accurate and beautiful and will clear up your skin and make your livestock stronger.”
“Let’s assume this data is noiseless, and fit an n-degree polynomial to the n points.” Or you could refrain from doing that.
Please stop asking me to raise matrices to the power of other matrices and then integrate over the result; my poor little brain can’t handle it.
What are the units of a” measurement precision”, in the context of noisy observations of some variable? I actually want to know, please tell me. Google just gives me information on how to measure things precisely.
I promise I’ll be less grumpy next chapter. This one has me feeling useless and also I took longer than usual to read it because of IRL bullshit.
I love this book’s diagrams! They are pretty and readable and informative and demonstrate that this really is still just Bayes under all the vectors.
“an object in 2d space, such as a missile or airplane“ No. I may not be able to understand a lot of the shit in this chapter but I know this is not right.
Combining noisy measurements from sensors of different but well-understood reliability! This is the good shit. 
Relatedly, college epistemology classes are not NEARLY mathematically rigorous enough, and while this isn’t the cause of everything wrong with our civilization per se, it doesn’t really help.
Figure 4.15 is informative, but the curves have a small-scale wobblyness that makes them look like they were drawn by an unsteady hand. I was wondering why right when the book got to explaining that chapter 15 will explain how to fix it.
5 notes · View notes
azaleakamellia · 5 years ago
Text
wildlife study design & analysis
Tumblr media
To cater for my lack of knowledge in biological data sampling and analysis, I actually signed up for the 'Wildlife Study Design and Data Analysis' organized by Biodiversity Conservation Society Sarawak
So, this new year, I've decided to take it down a notch and systematically choose my battlefield. Wildlife species data has always been mystery at me. As we all know, biologists hold them close to their hearts to the point of annoyance sometimes (those movies with scientists blindly running after some rare orchids or snakes or something like that really wasn't kidding). Hey...I get it and I totally agree - the data that belongs to the organization has to be treated with utmost confidentiality and all by the experts that collects them. Especially since we all know that they are not something so easily retrieved. Even more so, I optimistically support for the enthusiasm to be extended to their data cleaning and storing too while they're at it. But it doesn't mean I have to like the repercussions. Especially not when someone expects a habitat suitability map from me and I have no data to work with and all I had is a ping-pong game of exchanging jargon in the air with the hopes that the other player gets what you mean cough up something you can work with. Yes...there is not a shred of shame here when I talk about how things work in the world, but it is what it is and I'm not mad. It's just how it works in the challenging world of academics and research. 
To cater for my lack of knowledge in biological data sampling and analysis, I actually signed up for the 'Wildlife Study Design and Data Analysis' organized by
Biodiversity Conservation Society Sarawak (BCSS for short)
or
Pertubuhan Biodiversiti Konservasi Sarawak
It just ended yesterday and I can't say I did not cry internally. From pain and gratitude and accomplishment of the sort. 10 days of driving back and forth between the city center and UNIMAS was worth the traffic shennanigans.  
It is one of those workshops where you really do get down to the nitty-gritty part of understanding probability distribution from scratch; how to use it for your wildlife study data sampling design and analyzing them to obtain species abundance, occupancy or survival. And most importantly, how Bayes has got anything to do with it. I've been hearing and seeing Bayesian stats, methods and network on almost anything that involves data science, R and spatial stats that I am quite piffed that I did not understand a thing. I am happy to inform that now, I do. Suffice to say that it was a bootcamp well-deserved of the 'limited seats' reputation and the certificate really does feel like receiving a degree. It dwindles down to me realizing a few things I don't know:
I did not know that we have been comparing probabilities instead of generating a 'combined' one based on a previous study all these years.
I did not know that Ronald Fisher had such strong influence that he could ban the usage of Bayesian inference by deeming it unscientific.
I did not know that, for Fisher, if the observation cannot be repeated many times and is uncertain, then, the probability cannot be determined - which is crazy! You can't expect to shoot virus into people many times and see them die to generate probability that it is deadly!
I did not know that Bayes theorem actually combines prior probability and the likelihood data you collected on the field for your current study to generate the posterior probability distribution!
I did not know that Thomas Bayes was a pastor and his theory was so opposed to during his time. It was only after Ronald Fisher died that Bayesian inference gain favor especially in medical field. 
I did not know...well...almost anything at all about statistics!
It changed the way I look at statistics basically. But I self-taught myself into statistics for close to 9 years and of course I get it wrong most of the time; now I realize that for the umpph-th time. And for that, I hope the statistics power that be forgives me. Since this boot camp was so effective, I believe it is due to their effort in developing and executing the activities that demonstrates what probability distribution models we were observing. In fact, I wrote down the activities next to the topic just to remember what the deal was. Some of the stuffs covered are basics on Binomial Distribution, Poisson Distribution, Normal/Gaussian Distribution, Posterior probability, Maximum Likelihood Estimate (MLE), AIC, BACI, SECR, Occupancy and Survival probability. Yes...exhausting and I have to say, it wasn't easy. I could listen and distracted by paper falling for a fraction of time just to find myself lost in the barrage of information. What saved me was the fact that we have quizzes that we have to fill in to evaluate our understanding of the topic for the day and discuss them first thing in the next session. Best of all, we were using R with the following packages: wiqid, unmarked, rjags and rasters. Best locations for camera traps installation was discussed as well and all possible circumstances of your data; management and collection itself on the field, were covered rigorously. 
For any of you guys out there who are doing wildlife study, I believe that this boot camp contains quintessential information for you to understand to design your study better. Because once the data is produced, all we can do it dance around finding justification of some common pitfalls that we could've countered quite easily. 
In conclusion, not only that this workshop cast data analysis in a new light for me, but it also helps establishes the correct steps and enunciates the requirements to gain most out of your data. And in my case, it has not only let me understand what could be going on with my pals who go out into the jungle to observe the wildlife first hand, it has also given me ideas on looking for the resources that implements Bayesian statistics/methods on remote sensing and GI in general. Eventhough location analysis was not discussed beyond placing the locations of observation and occasions on the map, I am optimistic in further expanding what I understood into some of the stuff I'm planning; habitat suitability modeling and how to not start image classification from scratch...every single time if that's even possible. 
For more information on more workshops by BCSS or wildlife study design and the tools involved, check out the links below:
Biodiversity Conservation Society Sarawak (BCSS) homepage: https://bcss.org.my/index.htm
BCSS statistical tutorials: https://bcss.org.my/tut/
Mike Meredith's home page: http://mikemeredith.net/
And do check out some of these cool websites that I have referred to for more information as well as practice. Just to keep those brain muscles in loop with these 'new' concepts:
Statistical Rethinking: A Bayesian Course with Examples in R and Stan: https://github.com/rmcelreath/statrethinking_winter2019
Probability Concepts Explained: Introduction by Jonny Brooks-Bartlett: https://towardsdatascience.com/probability-concepts-explained-introduction-a7c0316de465 
Probability Concepts Explained: Maximum Likelihood Estimation by Jonny Brooks-Bartlett: https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1
Probability Concepts Explained: Bayesian Inference for Parameter Estimation by Jonny Brooks-Bartlett 
I'll be posting some of the things I am working on while utilizing the Bayesian stats. I'd love to see yours too!
P/S: Some people prefer to use base R with its simple interface, but if you're the type who works better with everything within your focal-view, I suggest you install RStudio. It's an IDE for R that helps to ease the 'anxiety' of using base R. 
P/S/S: Oh! Oh! This is the most important part of all. If you're using ArcGIS Pro like I do, did you know that it has R-Bridge that can enable the accessibility of R workspace in ArcGIS Pro? Supercool right?! If you want to know more on how to do that, check out this short 2 hour course on how to get the extension in and an example on how to use it: 
Using the R-Bridge: https://www.esri.com/training/catalog/58b5e417b89b7e000d8bfe45/using-the-r-arcgis-bridge/
0 notes
douchebagbrainwaves · 7 years ago
Text
WHY I'M SMARTER THAN LANGUAGE
We tend to write the software controlling those flying cars? But I always end up spending most of the members don't like it.1 Long but mistaken arguments are actually quite rare.2 Lexical closures provide a way to get startup ideas is hard. Wasting programmer time is the true inefficiency, not wasting machine time.3 Someone wrote recently that the drawback of Y Combinator wants to raise $250-500k. Language design is being taken over by hackers.4 A lot can change for a startup, it will sound plausible to a lot of money. Did they want French Vanilla or Lemon?
Organic growth seems to yield better technology and richer founders than the big bang method.5 I can call on any struct.6 The project either gets bogged down, or the startup will get bought, in which case problem solved, or the result is a free for all. Which means that even if we're generous to ourselves and assume that YC can on average triple a startup's expected value, we'd be taking the right amount of risk if only 30% of the startups were fundable would be a good idea, but you have to process video images depends on the rate at which you have to be facing off in a kind of business plan for a new type of number you've made up, you can envision companies as holes. I made for a panel discussion on programming language design at MIT on May 10,2001. What investors still don't get is how clueless and tentative great founders can seem at the very beginning.7 You have to approach it somewhat obliquely.
Usually this initial group of hackers using the language for others even to hear about it usually, because to prove yourself right you have to do is turn off the filters that usually prevent you from seeing them. This helps counteract the rule that in buying a house you should consider location first of all. I went to work for the love of it: amateurs. Which makes it easier to remember that it's an admirable thing to write great programs, even when this work doesn't translate easily into the conventional intellectual currency of research papers.8 In theory this is possible for species too, but it's a bad sign they even try. In some applications, presumably it could generate code efficient enough to run acceptably well on our hardware. The problem is the same reason Facebook has so far remained independent: acquirers underestimated them. If people are expected to behave well, they tend to be one of the only programming languages a surprising amount of effort has gone into preventing programmers from doing things that they think aren't good for you.9 I don't think we suck, but instead ask do we suck?10 And try to imagine what a transcript of the other guy's talk would be like teaching writing as grammar, without mentioning that its purpose is to refine the idea.11 But I'd rather use a site with primitive features and smart, nice users than a more advanced one whose users were idiots or trolls.12
Expressing ideas helps to form them. A company that an angel is willing to put $50,000 into at a valuation of a million can't take $6 million from VCs at that valuation. Afterward I put my talk online like I usually do.13 This is understandable with angels; they invest on a smaller scale. As a young founder under 23 say, are there things you and your friends would like to build great things for the companies they started would hire more employees as they grew. Having strings in a language where all the variables were the letter x with integer subscripts. Plus they're investing other people's money, and they even let kids in.14
It's due to the shape of the problem. If you want to notice startup ideas: those that grow organically out of your inbox?15 But I know the real reason we're so conservative is that we shouldn't be afraid to call the new Lisp Lisp.16 And it may be, this is the exact moment when technological progress stops. Currently the way VCs seem to operate is to invest in startups Y Combinator has funded. Then I do the same thing over and over seems kind of gross to me. To start with, investors are letting founders cash out partially.
And so interfaces tend not to change at all, and you'd get that fraction of big hits. That may be the greatest effect, in the sense that it lets hackers have their way with it. Essays should do the opposite. You might think that if they found a good deal of syntax in Common Lisp occurs in format strings; format is a language where you can spend as long thinking about each sentence as it takes to say it, a person hearing a talk can only spend as long thinking about each sentence as it takes to hear it. Is it necessary to take risks proportionate to the returns in this business. We wrote what was, 700 years ago, writing software pretty much meant writing software in general, because we'd be a long way toward fixing the problem: you'd soon learn what was expensive. The real question is, how far up the ladder of abstraction will parallelism go? It's pretty clear now that the healthiest diet is the one our peasant ancestors were forced to eat because they were so much more robust to have all the brains on the server. This is more pronounced among the very best hackers will like? But of course if you really get it, you can cry and say I can't and they won't even dare to take on ambitious projects. You're getting things done.17 But that's no different with any other tool.
And then there was the language and there was my program, written in the coming years will be Web-based software you can use any language you want, so if I can convince smart readers I must be pretty sharp. But business administration is not what you're doing in a startup founded by three former banking executives in their 40s who planned to outsource their product development—which to my mind is actually a lot riskier than investing in a pair of really smart 18 year olds—he couldn't be faulted, if it means anything at all, and you'd get that fraction of big hits.18 In one place I worked, we had a big board of dials showing what was happening to our web servers.19 But if you're living in the future. I decided the critical ingredients were rich people and nerds—investors and founders. I'm just saying you should think about who you really admire and hang out with them, instead of taking a class on, say, transportation or communications. Inventors of wonderful new things are often surprised to discover this, but you can't trust your judgment about that, so ignore it. When I go to a talk, you could fund everyone who seemed likely to succeed, it's hard not to think where it came from. How often does it happen that a rule works for thousands of years, then switches polarity?
Anything funny or gripping was ipso facto suspect, unless it was old enough to be rational and prefer the latter. When you know nothing, you have to be more than a language, or you have to get up on monday and go to work.20 At a good college, from which—because they're writing for a popular magazine—they then proceed to recoil in terror. How do you tell whether something is the germ of a giant company, or just a niche product? The sort of writing that attempts to persuade may be a necessary evil in a legal dispute, but it's not likely to have happened to any bigger than a cell. There is also the same: Darwinian. Those are like experiments that get inconclusive results.21 Translated into more straightforward language, this means: We're not investing in you, but we weren't interested in ecommerce per se. And it's not just the cost of reading it, and that is exactly the kind VCs won't touch. If there's something you're really interested in, you'll find valuable ones just sitting there waiting to be discovered right under our noses.
Notes
The optimal way to make that leap. 'Math for engineers' sucks, and this tends to happen fast, like storytellers, must have had a tiny.
Turn the other seed firms. For example, probably did more drugs in his early twenties compressed into the work that seems formidable from the CIA runs a venture fund called In-Q-Tel that is allowing economic inequality to turn down some good ideas buried in Bubble thinking. Geshke and Warnock only founded Adobe because Xerox ignored them. Maybe markets will eventually get comfortable with potential earnings.
It seemed better to embrace the fact by someone else start those startups. The Socialist People's Democratic Republic of X is probably part of wisdom. I hadn't had much success in doing a bad sign if you are unimportant. Or rather, where many of the political pressure against Airbnb than hotel companies.
They thought I was there when it converts. Perhaps the most important information about competitors is what approaches like Brightmail's will degenerate into once spammers are pushed into using mad-lib techniques to generate series A rounds from top VC funds whether it was putting local grocery stores out of school. They thought I was a very noticeable change in the long term than one level of links. Instead of bubbling up from the revenue-collecting half of it.
A round, that they kill you, it becomes an advantage to be higher, as on a saturday, he saw that I see a lot of people like them—people who are both.
Viaweb, he'd get his ear pierced. If you have more options. That's the lower bound to its precision. Now we don't have to solve this problem by having a gentlemen's agreement with the solutions.
Investors are one of them is that if a company tuned to exploit it.
But it turns out to do with the New Deal but with World War II the tax codes were so bad that they violate current startup fashions. As well as problems that have economic inequality.
No one seems to have gotten away with the high-minded Edwardian child-heroes of Edith Nesbit's The Wouldbegoods. I were doing Bayesian filtering in a bar. Professors and politicians live within socialist eddies of the next round, though more polite, was one firm that wanted to have to deliver these sentences as if a bunch of other people. No Logo, Naomi Klein says that I know when this happened because it depends on a valuation cap is merely a complicated but pointless collection of stuff to be a founder; and with that of whatever they copied.
Apple's products but their policies. I think in general we've done ok at fundraising is because other companies made all the East Coast VCs. 35 companies that tried that.
If you walk into a few additional sources on their companies. At Princeton, 36% of the conversion of buildings not previously public, like good scientists, motivated less by financial rewards than by selling them overpriced components. You need to fix once it's big, plus they are like, and that injustice is what we need to get users to recruit manually—is probably 99% cooperation.
Teenagers don't tell the craziest lies about me. It seemed better to read a draft of this.
In fact, this thought experiment works for nationality and religion too. To a 3 year old son, you'll be well on your board, there is some kind of gestures you use in representing physical things.
Some of the world in verse.
I have so far has trained them to ignore what your GPA was.
In 1995, when Subject foo degenerates to just foo, what if they did not become romantically involved till afterward. Some are merely ugly ducklings in the sample might be enough. The actual sentence in the same thing twice.
What made Google Google is that Digg is Slashdot with voting instead of reacting. Some VCs seem to be when I switch in mid-sentence, but starting a business, having sold all my shares earlier this year. Even as late as 1984. And yet I think it's mainly not having to have this second self keep a journal, and I don't think they'll be able to formalize a small amount, or Microsoft could not process it.
03%.
These points don't apply to the hour Google was founded, wouldn't offer to invest in these funds have no real substance.
Only founders of failing startups would even be working on is a dotted line on a road there are lots of type II startups spread: all you know Apple originally had three founders? They want so much better to read stories. But I'm convinced there were, like wages and productivity, but trained on corpora of stupid and non-broken form, that it might be?
Some of the word procrastination to describe what's happening till they measure their returns. The Civil Service Examinations of Imperial China, during the 2002-03 season was 4. Part of the world barely affects me.
1 note · View note
kristinsimmons · 5 years ago
Text
Is the COVID-19 Antibody Seroprevalence in Santa Clara County really 50-85 fold higher than the number of confirmed cases?
Tumblr media
By CHRISTOS ARGYROPOULOS
I am writing this blog post (the first after nearly two years!) in lockdown mode because of the rapidly spreading SARSCoV2 virus, the causative agent of the COVID19 disease (a poor choice of a name, since the disease itself is really SARS on steroids). One interesting feature of this disease is that a large number of patients will manifest minimal or no symptoms (“asymptomatic” infections), a state which must clearly be distinguished from the presymptomatic phase of the infection. In the latter, many patients who will eventually go on to develop the more serious forms of the disease have minimal symptoms. This is contrast to asymptomatic patients who will never develop anything more bothersome than mild symptoms (“sniffles”), for which they will never seek medical attention. Ever since the early phases of the COVID19 pandemic, a prominent narrative postulated that asymptomatic infections are much more common than symptomatic ones. Therefore, calculations such as the Case Fatality Rate (CFR = deaths over all symptomatic cases) mislead about the Infection Fatality Rate (IFR = deaths over all cases). Subthreads of this narrative go on to postulate that the lockdowns which have been implemented widely around the world are overkill because COVID19 is no more lethal than the flu, when lethality is calculated over ALL infections. Whereas the politicization of the lockdown argument is of no interest to the author of this blog (after all the virus does not care whether its victim is rich or poor, white or non-white, Westerner or Asian), estimating the prevalence of individuals who were exposed to the virus but never developed symptoms is important for public health, epidemiological and medical care reasons. Since these patients do not seek medical evaluation, they will not detected by acute care tests (viral loads in PCR based assays). However such patients, may be detected after the fact by looking for evidence of past infection, in the form of circulating antibodies in the patients’ serum. I was thus very excited to read about the release of a preprint describing a seroprevalence study in Santa Clara County, California. This preprint described the results of a cross-sectional examination of the residents in the county in Santa Clara, with a lateral flow immunoassay (similar to a home pregnancy kit) for the presence of antibodies against the SARSCoV2 virus. The presence of antibodies signifies that the patient was not only exposed at some point to the virus, but this exposure led to an actual infection to which the immune system responded by forming antibodies. These resulting antibodies persist for far longer than the actual infection and thus provide an indirect record of who was infected. More importantly, such antibodies may be the only way to detect asymptomatic infections, because these patients will not manifest any symptoms that will make them seek medical attention, when they were actively infected. Hence, the premise of the Santa Clara study is a solid one and in fact we need many more of these studies. But did the study actually deliver? Let’s take a deep dive into the preprint.
What did the preprint claim to show? The authors’ major conclusions are :
The population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection is much more widespread than indicated by the number of confirmed cases.
Population prevalence estimates can now be used to calibrate epidemic and mortality projections.
Both conclusions rest upon a calculation that claims :
These prevalence estimates represent a range between 48,000 and 81,000 people infected in Santa Clara County by early April, 50-85-fold more than the number of confirmed cases.
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1
If these numbers were true, they would constitute a tectonic shift in our understanding of this disease. For starters, this would imply that the virus is substantially more contagious than what people think (though recent analyses of both Chinese and European data also show this point quite nicely). Secondly, if the number of asymptomatic infections is very high, then perhaps we are close to achieving herd immunity and thus perhaps we can relax the lockdowns. Finally, if the asymptomatic infections are numerous, then the disease is not that lethal and thus the lockdowns were an over-exaggeration, which killed the economy for the “sniffles” or the “flu”. Since the author’s argument rests upon a calculation, it is important to ensure that the calculation was done correctly. If we find deficiencies in the calculation, then the author’s conclusions become a collapsing house of cards. Statistical calculations of this sort can mislead as a result of poor data AND/OR poor calculation. While this blog post focuses on the calculation itself, there are certain data deficiencies that will be pointed out along the way.
How did the authors carry out their research?
The high-level description of the approach the authors took, may be found in their abstract:
 We measured the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County. Methods On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer’s data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1
First and foremost, we note the substantial degree of selection bias inherent in the use of social media for recruitment of participants in the study. This resulted in a cohort that was quite non-representative of the residents of Santa Clara county and in fact the authors had to rely on post-stratification weighting to come up with a representative cohort. These methods can produce reasonable results if the weighting scheme is selected carefully. Andrew Gelman analyzed the scheme and strategy adopted by the study authors. His blog post, is an excellent read about the many design issues of the Santa Clara study and you should read it, if you have not read it already. There are many other sources of selection bias identified by other commentators:
I don't think any of the primary authors of the Stanford seroprevalence study are on Twitter, but there are some questions about the work that I haven't seen asked elsewhere that I would love to see addressed at some point. https://t.co/sJuOnt4mTk
— mbeisen (@mbeisen) April 20, 2020
When taken together these comments suggest some serious deficiencies with the source data. Notwithstanding these issues, a major question still remains: namely whether the authors properly accounted for the assay inaccuracy in their calculations. Gelman put it succinctly as follows:
This is the big one. If X% of the population have the antibodies and the test has an error rate that’s not a lot lower than X%, you’re in big trouble. This doesn’t mean you shouldn’t do testing, but it does mean you need to interpret the results carefully.
https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaws-in-stanford-study-of-coronavirus-prevalence/
The shorter version of this argument would thus characterize the positive tests in this “bombshell” preprint as:
This is the bombshell #COVID19 preprint of the day COVID-19 Antibody Seroprevalence in Santa Clara County, California https://t.co/jvqbgMwFT7 Population prevalence of #SARSCoV2 seropositivity between 2.5-4.2% (implication this virus must have an effectively infinite R0) pic.twitter.com/lkGp3MypfH
— ChristosArgyropoulos (@ChristosArgyrop) April 17, 2020
A related issue concerns the uncertainty estimates reported by the authors, which is a byproduct of the calculations. Again, this issue was highlighted by Gelman:
3. Uncertainty intervals. So what’s going on here? If the specificity data in the paper are consistent with all the tests being false positives—not that we believe all the tests are false positives, but this suggests we can’t then estimate the true positive rate with any precision—then how do they get a confidence nonzero estimate of the true positive rate in the population?
who similarly to Rupert Beale goes on to conclude:
First, if the specificity were less than 97.9%, you’d expect more than 70 positive cases out of 3330 tests. But they only saw 50 positives, so I don’t think that 1% rate makes sense. Second, the bit about the sensitivity is a red herring here. The uncertainty here is pretty much entirely driven by the uncertainty in the specificity.
To summarize, the authors of the preprint conclude that there many many more infections in Santa Clara than those captured by the current testing, while others looking at the same data and the characteristics of the test employed think that the asymptomatic infections are not as frequent as the Santa Clara preprint claims to be. In the next paragraphs, I will reanalyze the summary data reported by the preprint authors and show they are most definitely wrong: while there are asymptomatic COVID19 presentations, their number is nowhere close to being 50-80 fold higher than the symptomatic ones as the authors claim.
A Bayesian Re-Analysis of the Santa Clara Seroprevalence Study
There are three pieces of data in the preprint that are relevant in answering the preprint question:
The number of positives(y=50) out of (N=3,330) tests performed in Santa Clara
The characteristics of the test used as reported by the manufacturer. This is conveniently given as a two by two table cross classifying the readout of the assay (positive: +ve or negative: -ve) in disease +ve and -ve gold standard samples. In these samples, we assume that the disease status of the individuals assayed is known perfectly. Hence a less than perfect alignment of the assay results to the ground truth, points towards assay imperfections or inaccuracies. For example, a perfect assay would classify all COVID19 +ve patients as positive and all COVID19 -ve as assay negative.
Assay +veAssay -veCOVID19 +ve787COVID19 -ve2369
Premier Biotech Lateral Flow COVID19 Assay, Manufacturer Data
The characteristics of the test in a local, Stanford cohort of patients. This may also be given as 2 x 2 table:
Assay +veAssay -veCOVID19 +ve2512COVID19 -ve030
Premier Biotech Lateral Flow COVID19 Assay, Stanford Data
We will build the analysis of these three pieces of data in stages starting with the analysis of the assay performance in the gold standard samples.
Characterizing test performance
Eyeballing the test characteristics in the two distinct gold standard sample collections, immediately shows that the test behaved differently in the gold standard samples used by the manufacturer and by the Stanford team. In particular, sensitivity (aka true positive rate is much higher in the manufacturer samples v.s. the Stanford samples, i.e. 78/85 vs 25/37 respectively. Conversely, the Specificity (or true negative rate) is smaller in the manufacturer dataset than the Stanford gold standard data: 369/371 is certainly smaller than 30/30. Perhaps the reason for these differences may be found in how Stanford constructed their gold standard samples:
Test Kit Performance
The manufacturer’s performance characteristics were available prior to the study (using 85 confirmed positive and 371 confirmed negative samples). We conducted additional testing to assess the kit performance using local specimens. We tested the kits using sera from 37 RT-PCR-positive patients at Stanford Hospital that were also IgG and/or IgM-positive on a locally developed ELISA assay. We also tested the kits on 30 pre-COVID samples from Stanford Hospital to derive an independent measure of specificity.
Perhaps, the ELISA assay developed in house by the Stanford scientists is more sensitive than the lateral flow assay used in the study. Or the Stanford team used the lateral flow assay in a way that decreased its sensitivity. While lateral flow immunoassays are conceptually, “yes” or “no” assays, there is still room for suspect or intermediate results (as shown below for the pregnancy test)
Tumblr media
Operation of a lateral flow immunoassay
While no information is given about how the intermediate results were handled (or whether the assay itself generates more than one grades of intermediate results), it is conceivable that these were handled differently by the investigators . For example it is possible that intermediate results the manufacturer labelled as positive in their validation, were labeled as negative by the investigators. Even more likely, the participant samples were handled differently by the investigators, so that the antibody levels required to give a positive result differed in the Santa Clara study relative to the tests run by the manufacturer. In either case, the Receiver Operating Characteristic (ROC) curve in the field differs from the one of the manufacturer. Furthermore, we have very little evidence that it will not differ again in future applications. Finally, there is the possibility of good old fashioned sampling variability in the performance of the assay in the gold standard samples, without any actual substantive differences in the assay performance. Hence, the two main possibilities we considered are:
Common Test Characteristic Model: We assume that random variation in the performance of the test in the two gold standards is at play, and we estimate a common sensitivity and specificity over the two gold standard sets. In probabilistic terms, we will assume the following model for the true positive and negative rates in the two gold standard datasets :
 and
 . In these expressions, the TP, FN, TN, FP stand for the True Positive, False Negative, True Negative and False Positive respectively, while the subscript, i, indexes the two gold standard datasets.
Shifted along (the) ROC Model: In this model, we assume a bona fide shift in the characteristics of the assay when tested in the gold standard tests. This variation in characteristics, is best conceptualized as a shift along the straight line in the binormal, normal deviate plot . This plot transforms the standard ROC plots of sensitivity v.s. 1-specificity so that the familiar ROC bent curves look like straight lines. Operationally, we assume that there will be run specific variation in the assay characteristics when it is applied to future samples. The performance of the assay in the two gold standard sets provide a measure of this variability, which may resurface in subsequent use of the assay. When analyzing the data from the Santa Clara survey, we will acknowledge this variability by assuming that the assay performance (sensitivity/specificity) in this third application (red) interpolates the performance noted in the two gold standards (black).
Tumblr media
The shift-along the ROC model is specified by fitting separate specificities and sensitivities for the two gold standard datasets, which are then transformed into the binormal plot scale , averaged and then transformed back to normal probability scale to calculate an average sensitivity and specificity. For computational reasons, we parameterize the models for the sensitivity and specificity in logit space. This parameterization also allows us to explore the sensitivity of the model to the choice of the model priors for the Bayesian analysis.
Model for the observed positive rates
Tumblr media
The model for the observed positive counts is a binomial law , relating the positive tests in the Santa Clara sample to the total number of tests: Priors For the common test characteristic model, we will assume standard, uniform, non-informative priors for the Sensitivity, Specificity and Prevalence. The model may be fit with the following STAN program:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950functions {}data { int casesSantaClara;   // Cases in Santa Claraint NSantaClara;      // ppl sampled in Santa Clara int TPStanford;       // true positive in Stanfordint NPosStanford;     // Total number of positive cases in Stanford dataset int TNStanford;       // true negative in Stanfordint NNegStanford;     // Total number of negative cases in Stanford dataset int TPManufacturer;       // true positive in Manufacturer datasetint NPosManufacturer;     // Total number of positive cases in Manufacturer dataset int TNManufacturer;       // true negative in Manufacturer datasetint NNegManufacturer;     // Total number of negative cases in Manufacturer dataset }transformed data { }parameters {real<lower=0.0,upper=1.0> Sens;real<lower=0.0,upper=1.0> Spec;real<lower=0.0,upper=1.0> Prev; }transformed parameters {  }model { casesSantaClara~binomial(NSantaClara,Prev*Sens+(1-Prev)*(1-Spec));TPStanford~binomial(NPosStanford,Sens);TPManufacturer~binomial(NPosManufacturer,Sens);TNStanford~binomial(NNegStanford,Spec);TNManufacturer~binomial(NNegManufacturer,Spec); Prev~uniform(0.0,1.0);Spec~uniform(0.0,1.0);Spec~uniform(0.0,1.0);}generated quantities {real totposrate;real rateratio;totposrate = (Prev*Sens+(1-Prev)*(1-Spec));rateratio = Prev/totposrate;
For the shifted-along-ROC model, we explored two different settings of noninformative priors for the logit of sensitivity and specificity:
a logistic(0,1) prior for the logit of the sensitivities and specificities that is mathematically equivalent to a uniform prior for the sensitivity and specificity in the probability scale.
a normal(0,1.6) prior for the logits. While these two densities are very similar, the normal (black) has thinner tails, shrinking the probability estimates more than the logistic one (red) .
Tumblr media
Comparison of logistic(0,1) vs normal (0,1.6) prior densities
The shifted-ROC model is fit by the following STAN code (only the normal prior model is inlined below)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768functions {}data { int casesSantaClara;   // Cases in Santa Claraint NSantaClara;      // ppl sampled in Santa Clara int TPStanford;       // true positive in Stanfordint NPosStanford;     // Total number of positive cases in Stanford dataset int TNStanford;       // true negative in Stanfordint NNegStanford;     // Total number of negative cases in Stanford dataset int TPManufacturer;       // true positive in Manufacturer datasetint NPosManufacturer;     // Total number of positive cases in Manufacturer dataset int TNManufacturer;       // true negative in Manufacturer datasetint NNegManufacturer;     // Total number of negative cases in Manufacturer dataset }transformed data { }parameters {real<lower=0.0,upper=1.0> Prev;real logitSens[2];real logitSpec[2]; }transformed parameters {real<lower=0.0,upper=1.0> Sens;real<lower=0.0,upper=1.0> Spec; // Use the sensitivity and specificity by the mid point between the two kits // mid point is found in inv_Phi space of the graph relating Sensitivity to 1-Specificity// note that we break down the calculations involved in going from logit space to inv_Phi// space in the generated quantities sessions for clarity Sens=Phi(0.5*(inv_Phi(inv_logit(logitSens[1]))+inv_Phi(inv_logit(logitSens[2]))));Spec=Phi(0.5*(inv_Phi(inv_logit(logitSpec[1]))+inv_Phi(inv_logit(logitSpec[2]))));}model { casesSantaClara~binomial(NSantaClara,Prev*Sens+(1-Prev)*(1-Spec));TPStanford~binomial_logit(NPosStanford,logitSens[1]);TPManufacturer~binomial_logit(NPosManufacturer,logitSens[2]); TNStanford~binomial_logit(NNegStanford,logitSpec[1]);TNManufacturer~binomial_logit(NNegManufacturer,logitSpec[2]); logitSens~logistic(0.0,1.0);  logitSpec~logistic(0.0,1.0);   Prev~uniform(0.0,1.0);}generated quantities {// various post sampling estimatesreal totposrate;        // Total positive ratereal kitSens[2];        // Sensitivity of the kits   real kitSpec[2];        // Spec of the kitsreal rateratio;         // ratio of prevalence to total positive rate totposrate = (Prev*Sens+(1-Prev)*(1-Spec));kitSens=inv_logit(logitSens);kitSpec=inv_logit(logitSpec);rateratio = Prev/totposrate;}
Estimation of the Prevalence of COVID19 Seropositivity in Santa Clara County
Preprint author analyses and interpretation
Table 2 of the preprint summarizes the estimates of the prevalence of seropositivity in Santa Clara. This table is reproduced below for ease of reference
ApproachPoint Estimate95% Confidence IntervalUnadjusted1.501.11 – 1.97Population-adjusted2.812.24-4.16Population and test performance adjustedManufacturer’s data2.491.80 – 3.17Stanford data4.162.58 – 5.70Manufacturer + Stanford Data2.752.01 – 3.49
Estimated Prevalence of COVID19 Seropositivity Bendavid et al
The unadjusted estimate is computed on the basis of the observed positive counts (50/3,330 tests) and the population adjusted estimate is obtained by accounting for the non-representativeness of the sample using post-stratification weights. Finally, the last 3 rows give the estimates of the prevalence while accounting for the test characteristics. The basis of the comment that the prevalence of COVID19 infection is 50-85 fold higher is based on the estimated point prevalence of 2.49 and 4.16 respectively. Computation of these estimates, require the sensitivity and specificity for the immunoassay which the authors quote as:
Our estimates of sensitivity based on the manufacturer’s and locally tested data were 91.8% (using the lower estimate based on IgM, 95 CI 83.8-96.6%) and 67.6% (95 CI 50.2-82.0%), respectively. Similarly, our estimates of specificity are 99.5% (95 CI 98.1-99.9%) and 100% (95 CI 90.5-100%). A combination of both data sources provides us with a combined sensitivity of 80.3% (95 CI 72.1-87.0%) and a specificity of 99.5% (95 CI 98.3-99.9%).
The results obtained are crucially dependent upon the values of the sensitivity and specificity, something that the authors explicitly state in the discussion:
For example, if new estimates indicate test specificity to be less than 97.9%, our SARS-CoV-2 prevalence estimate would change from 2.8% toless than 1%, and the lower uncertainty bound of our estimate would include zero. On the other hand, lower sensitivity, which has been raised as a concern with point-of-care test kits, would imply that the population prevalence would be even higher. New information on test kit performance and population should be incorporated as more testing is done and we plan to revise our estimates accordingly.
While the authors claim they have accounted for these test characteristics, the method of adjustment is not spelled out in detail in the main text. Only by looking into the statistical appendix do we find the following disclaimer about the approximate formula used:
There is one important caveat to this formula: it only holds as long as (one minus) the specificity of the test is higher than the sample prevalence. If it is lower, all the observed positives in the sample could be due to false-positive test results, and we cannot exclude zero prevalence as a possibility. As long as the specificity is high relative to the sample prevalence, this expression allows us to recover population prevalence from sample prevalence, despite using a noisy test.
In other words, the authors are making use of an approximation, that relies on a assumption about the study results that may or may not be true. But if one is using such an approximation, then one has already decided what the results and the test characteristics should look like. We will not make such assumption in our own calculations.
Bayesian Analysis
The Bayesian analyses we report here roughly corresponds to a combination that Bendavid et al did not consider, namely an analysis using only the test characteristics, but without the post-stratification weights. Even though a Bayesian analysis that used such weights is possible, the authors do not report the number of positives and negatives in each strata, along with the relevant weights. If this information had been made available, then we could repeat the same calculations and see what we obtain. While, I can’t reproduce the exact analysis examination of Table 2 in Bendavid et al suggests that weighting should increase the prevalence by a “fudge” factor. For the purpose of this section we take the fudge factor to be equal to the ratio of the prevalence after population adjustment over the unadjusted prevalence: 2.81/1.50 ~ 1.87 . For fitting the Bayesian analyses, I used the No U Turn Sampler implemented in STAN; Isimulated five chains, using 5,000 warm-up iterations and 1,000 post-warm samples to compute summaries. Rhat for all parameters was <1.01 for all models (R and STAN code for all analyses are provided here to ensure reproducibility).santaclaraΛήψη
Sensitivities and specificities for the three Bayesian analyses: common-parameters, shifted-along-ROC (logistic prior), shifted-along-ROC (normal prior) are shown below:
SensitivityModelMeanMedian95% Credible IntervalCommon Parameters0.8360.8380.768 – 0.897Shifted-along-ROC (logistic)0.8110.8130.733 – 0.882Shifted-along-ROC (normal)0.8100.8130.734 – 0.878SpecificityCommon Parameters0.9930.9940.986 – 0.998Shifted-along-ROC (logistic)0.9910.9910.983 – 0.998Shifted-along-ROC (normal)0.9890.9880.982 – 0.995
Bayesian Analysis of Assay Characteristics
The three models seem to yield similar, yet not identical estimates of the assay characteristic and one may even expect that the estimated prevalences would be somewhat similar to the analyses by the preprint authors. The estimated prevalence by the three approaches (without application of the fudge factor) are shown below:
ModelMeanMedian95% Credible IntervalCommon Parameters1.031.050.16-1.88Shifted-along-ROC (logistic)0.810.810.50 – 1.84Shifted-along-ROC (normal)0.560.500.02 – 1.46
Model Estimated Prevalence (%)
Application of the fudge factors would increase the point estimates to 1.93%, 1.52% and 1.05% which more than 50% smaller than the prevalence estimates computed by Bendavid et al. To conclude our Bayesian analyses, we compared the marginal likelihoods of the three models by means of the bridge sampler. This computation allows us to roughly see which of the three model/prior combinations is supported by the data. This analysis provided overwhelming support for the Shifted-along-ROC model with the normal (shrinkage) prior:
ModelPosterior ProbabilityCommon Parameters0.032Shifted-along-ROC (logistic)0.030Shifted-along-ROC (normal)0.938
Conclusions
In this rather long post, I undertook a Bayesian analysis of the seroprevalence survey in Santa Clara county. Contrary to the author’s assertions, this formal Bayesian analysis suggests a much lower seroprevalence of COVID19 infections. The estimated fold increase is only 20 times higher (point estimate), rather than 50-80 fold higher and with a 95% credible interval 0.75 -55 fold for the most probable model.
The major driver of the prevalence, is the specificity of the assay (not the sensitivity), so that particular choices for this parameter will have a huge impact on the estimated prevalence of COVID19 infection based on seroprevalence data. Whereas the common parameter and the shifted-along-ROC models yield similar estimates for the same prior (the uniform in probability scale is equivalent to standard logistic in the logit scale), a minor change to the priors used by the shifted-along-ROC model, leads to results that are qualitatively and quantitatively different. The sensitivitity of the estimated prevalence to the prior, suggests that the assay performance data are not sufficient to determine either the sensitivity or the specificity with precision. Even the minor shrinkage implied by changing the prior from the standard logistic to the standard normal , provides a small, yet crucial protection against overfitting and leads to a model with extremely sharp marginal posterior likelihood.
Are the Bayesian analysis results reasonable? I still can’t trust the quality and the biases inherent in the source data, but at least the calculations were done lege artis . Interestingly enough, the estimated prevalence of COVID19 infection (1-2%) by the Bayesian methodology described here, is rather similar to the prevalence (<2%) reported in Telluride, Colorado.
The analysis reported herein could be made a lot more comparable to the analyses reported by Bendavid et al, if the authors had provided the strata data (positive and negative test results and sample weights). In the absence of this information, the use of fudge factors is the best equivalent approximation to the analyses reported in the preprint. Is the Santa Clara study useful for our understanding of the COVID19 epidemic?
I feel that the seroprevalence study in Santa Clara is not the “bombshell” study that I initially felt it would be (so apologies to all my Twitter followers whom I may have misled). There are many methodological issues starting with the study design and extending to the author’s analyses of test performance. Having spent hours troubleshooting the code (and writing the blog post!), my feelings align perfectly with Andrew Gelman’s:
I think the authors of the above-linked paper owe us all an apology. We wasted time and effort discussing this paper whose main selling point was some numbers that were essentially the product of a statistical error.
I’m serious about the apology. Everyone makes mistakes. I don’t think they authors need to apologize just because they screwed up. I think they need to apologize because these were avoidable screw-ups. They’re the kind of screw-ups that happen if you want to leap out with an exciting finding and you don’t look too carefully at what you might have done wrong.
COVID19 is a serious disease, which is far worse than the flu in terms of contagiousness, severity and lethality of the severe forms and level of medical intensive care support needed to avert a lethal outcome. Knowing the prevalence will provide one of the many missing pieces of the puzzle we are asked to solve. This piece of information will allow us to mount an effective public health response and reopen our society and economy without risking countless lives.
Disclaimer: I provided the data for my analyses with the hope that someone can correct if I am wrong and corroborate me if I am right. Open Science is the best defense against Bad science and making the code available is a key step in the process.
Christos Argyropoulos is a clinical nephrologist, amateur statistician and Division Chief of Nephrology at the University of New Mexico Health Sciences Center. This post originally appeared on his blog here.
The post Is the COVID-19 Antibody Seroprevalence in Santa Clara County really 50-85 fold higher than the number of confirmed cases? appeared first on The Health Care Blog.
Is the COVID-19 Antibody Seroprevalence in Santa Clara County really 50-85 fold higher than the number of confirmed cases? published first on https://wittooth.tumblr.com/
0 notes
lauramalchowblog · 5 years ago
Text
Is the COVID-19 Antibody Seroprevalence in Santa Clara County really 50-85 fold higher than the number of confirmed cases?
Tumblr media
By CHRISTOS ARGYROPOULOS
I am writing this blog post (the first after nearly two years!) in lockdown mode because of the rapidly spreading SARSCoV2 virus, the causative agent of the COVID19 disease (a poor choice of a name, since the disease itself is really SARS on steroids). One interesting feature of this disease, is that a large number of patients will manifest minimal or no symptoms (“asymptomatic” infections), a state which must clearly be distinguished from the presymptomatic phase of the infection. In the latter, many patients who will eventually go on to develop the more serious forms of the disease have minimal symptoms. This is contrast to asymptomatic patients who will never develop anything more bothersome than mild symptoms (“sniffles”), for which they will never seek medical attention. Ever since the early phases of the COVID19 pandemic, a prominent narrative postulated that asymptomatic infections are much more common than symptomatic ones. Therefore, calculations such as the Case Fatality Rate (CFR = deaths over all symptomatic cases) mislead about the Infection Fatality Rate (IFR = deaths over all cases). Subthreads of this narrative go on to postulate that the lockdowns which have been implemented widely around the world, are an overkill because COVID19 is no more lethal than the flu, when lethality is calculated over ALL infections. Whereas, the politicization of the lockdown argument is of no interest to the owner of this blog (after all the virus does not care whether its victim is rich or poor, white or non white, Westerner or Asian), estimating the prevalence of individuals exposed to the virus but never developed symptoms is important for public health, epidemiological and medical care reasons. Since these patients do not seek medical evaluation, they will not detected by acute care tests (viral loads in PCR based assays). However such patients, may be detected after the fact by looking for evidence of past infection, in the form of circulating antibodies in the patients’ serum. I was thus very excited to read about the release of a preprint describing a seroprevalence study in Santa Clara County, California. This preprint described the results of a cross – sectional examination of the residents in the county in Santa Clara, with a lateral flow immunoassay (similar to a home pregnancy kit) for the presence of antibodies against the SARSCoV2 virus. The presence of antibodies signifies that the patient was not only exposed at some point to the virus, but this exposure led to an actual infection to which the immune system responded by forming antibodies. These resulting antibodies persist for far longer than the actual infection and thus provide an indirect record of who was infected . More importantly, such antibodies may be the only way to detect asymptomatic infections, because these patients will not manifest any symptoms that will make them seek medical attention, when they were actively infected. Hence, the premise of the Santa Clara study is a solid one and in fact we need many more of these studies. But did the study actual deliver? Let’s take a deep dive in the preprint.
What did the preprint claim to show? The authors major conclusions are :
The population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection is much more widespread than indicated by the number of confirmed cases.
Population prevalence estimates can now be used to calibrate epidemic and mortality projections.
Both conclusions rest upon a calculation that claims :
These prevalence estimates represent a range between 48,000 and 81,000 people infected in Santa Clara County by early April, 50-85-fold more than the number of confirmed cases.
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1
If these numbers were true, they would constitute a tectonic shift in our understanding of this disease. For starters, this would imply that the virus is substantially more contagious than what people think (though recent analyses of both Chinese and European data also show this point quite nicely). Secondly, if the number of asymptomatic infections is very high, then perhaps we are close to achieving herd immunity and thus perhaps we can relax the lockdowns. Finally, if the asymptomatic infections are numerous, then the disease is not that lethal and thus the lockdowns were an over-exaggeration , which killed the economy for the “sniffles” or the “flu”. Since the author’s argument rests upon a calculation, it is important to ensure that the calculation was done correctly. If we find deficiencies in the calculation, then the author’s conclusions become a collapsing house of cards. Statistical calculations of this sort can mislead as a result of poor data AND/OR poor calculation. While this blog post focuses on the calculation itself, there are certain data deficiencies that will be pointed out along the way.
How did the authors carry out their research?
The high level description of the approach the authors took, may be found in their abstract:
 We measured the seroprevalence of antibodies to SARS-CoV-2 in Santa Clara County. Methods On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer’s data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both
https://www.medrxiv.org/content/10.1101/2020.04.14.20062463v1
First and foremost, we note the substantial degree of selection bias inherent in the use of social media for recruitment of participants in the study. This resulted in a cohort that was quite non-representative of the residents of Santa Clara county and in fact the authors had to rely on post-stratification weighting to come up with a representative cohort. These methods can produce reasonable results if the weighting scheme is selected carefully. Andrew Gelman analyzed the scheme and strategy adopted by the study authors. His blog post, is an excellent read about the many design issues of the Santa Clara study and you should read it, if you have not read it already. There are many other sources of selection bias identified by other commentators:
I don't think any of the primary authors of the Stanford seroprevalence study are on Twitter, but there are some questions about the work that I haven't seen asked elsewhere that I would love to see addressed at some point. https://t.co/sJuOnt4mTk
— mbeisen (@mbeisen) April 20, 2020
When taken together these comments suggest some serious deficiencies with the source data. Notwithstanding these issues, a major question still remains: namely whether the authors properly accounted for the assay inaccuracy in their calculations. Gelman put it succinctly as follows:
This is the big one. If X% of the population have the antibodies and the test has an error rate that’s not a lot lower than X%, you’re in big trouble. This doesn’t mean you shouldn’t do testing, but it does mean you need to interpret the results carefully.
https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaws-in-stanford-study-of-coronavirus-prevalence/
The shorter version , of this argument would thus characterize the positive tests in this “bombshell” preprint as:
This is the bombshell #COVID19 preprint of the day COVID-19 Antibody Seroprevalence in Santa Clara County, California https://t.co/jvqbgMwFT7 Population prevalence of #SARSCoV2 seropositivity between 2.5-4.2% (implication this virus must have an effectively infinite R0) pic.twitter.com/lkGp3MypfH
— ChristosArgyropoulos (@ChristosArgyrop) April 17, 2020
A related issue concerns the uncertainty estimates reported by the authors, which is a byproduct of the calculations . Again, this issue was highlighted by Gelman:
3. Uncertainty intervals. So what’s going on here? If the specificity data in the paper are consistent with all the tests being false positives—not that we believe all the tests are false positives, but this suggests we can’t then estimate the true positive rate with any precision—then how do they get a confidence nonzero estimate of the true positive rate in the population?
who similarly to Rupert Beale goes on to conclude:
First, if the specificity were less than 97.9%, you’d expect more than 70 positive cases out of 3330 tests. But they only saw 50 positives, so I don’t think that 1% rate makes sense. Second, the bit about the sensitivity is a red herring here. The uncertainty here is pretty much entirely driven by the uncertainty in the specificity.
To summarize, the authors of the preprint conclude that there many many more infections in Santa Clara than those captured by the current testing, while others looking at the same data and the characteristics of the test employed think that the asymptomatic infections are not as frequent as the Santa Clara preprint claims to be. In the next paragraphs, I will reanalyze the summary data reported by the preprint authors and show they are most definitely wrong: while there are asymptomatic COVID19 presentations, their number is no where close to being 50-80 fold higher than the symptomatic ones as the authors claim.
A Bayesian Re-Analysis of the Santa Clara Seroprevalence Study
There are three pieces of data in the preprint that are relevant in answering the preprint question:
The number of positives(y=50) out of (N=3,330) tests performed in Santa Clara
The characteristics of the test used as reported by the manufacturer. This is conveniently given as a two by two table cross classifying the readout of the assay (positive: +ve or negative: -ve) in disease +ve and -ve gold standard samples. In these samples, we assume that the disease status of the individuals assayed is known perfectly. Hence a less than perfect alignment of the assay results to the ground truth, points towards assay imperfections or inaccuracies. For example, a perfect assay would classify all COVID19 +ve patients as positive and all COVID19 -ve as assay negative.
Assay +veAssay -veCOVID19 +ve787COVID19 -ve2369
Premier Biotech Lateral Flow COVID19 Assay, Manufacturer Data
The characteristics of the test in a local, Stanford cohort of patients. This may also be given as 2 x 2 table:
Assay +veAssay -veCOVID19 +ve2512COVID19 -ve030
Premier Biotech Lateral Flow COVID19 Assay, Stanford Data
We will build the analysis of these three pieces of data in stages starting with the analysis of the assay performance in the gold standard samples.
Characterizing test performance
Eyeballing the test characteristics in the two distinct gold standard sample collections, immediately shows that the test behaved differently in the gold standard samples used by the manufacturer and by the Stanford team. In particular, sensitivity (aka true positive rate is much higher in the manufacturer samples v.s. the Stanford samples, i.e. 78/85 vs 25/37 respectively. Conversely, the Specificity (or true negative rate) is smaller in the manufacturer dataset than the Stanford gold standard data: 369/371 is certainly smaller than 30/30. Perhaps the reason for these differences may be found in how Stanford constructed their gold standard samples:
Test Kit Performance
The manufacturer’s performance characteristics were available prior to the study (using 85 confirmed positive and 371 confirmed negative samples). We conducted additional testing to assess the kit performance using local specimens. We tested the kits using sera from 37 RT-PCR-positive patients at Stanford Hospital that were also IgG and/or IgM-positive on a locally developed ELISA assay. We also tested the kits on 30 pre-COVID samples from Stanford Hospital to derive an independent measure of specificity.
Perhaps, the ELISA assay developed in house by the Stanford scientists is more sensitive than the lateral flow assay used in the study. Or the Stanford team used the lateral flow assay in a way that decreased its sensitivity. While lateral flow immunoassays are conceptually, “yes” or “no” assays, there is still room for suspect or intermediate results (as shown below for the pregnancy test)
Tumblr media
Operation of a lateral flow immunoassay
While no information is given about how the intermediate results were handled (or whether the assay itself generates more than one grades of intermediate results), it is conceivable that these were handled differently by the investigators . For example it is possible that intermediate results the manufacturer labelled as positive in their validation, were labeled as negative by the investigators. Even more likely, the participant samples were handled differently by the investigators, so that the antibody levels required to give a positive result differed in the Santa Clara study relative to the tests run by the manufacturer. In either case, the Receiver Operating Characteristic (ROC) curve in the field differs from the one of the manufacturer. Furthermore, we have very little evidence that it will not differ again in future applications. Finally, there is the possibility of good old fashioned sampling variability in the performance of the assay in the gold standard samples, without any actual substantive differences in the assay performance. Hence, the two main possibilities we considered are:
Common Test Characteristic Model: We assume that random variation in the performance of the test in the two gold standards is at play, and we estimate a common sensitivity and specificity over the two gold standard sets. In probabilistic terms, we will assume the following model for the true positive and negative rates in the two gold standard datasets :
 and
 . In these expressions, the TP, FN, TN, FP stand for the True Positive, False Negative, True Negative and False Positive respectively, while the subscript, i, indexes the two gold standard datasets.
Shifted along (the) ROC Model: In this model, we assume a bona fide shift in the characteristics of the assay when tested in the gold standard tests. This variation in characteristics, is best conceptualized as a shift along the straight line in the binormal, normal deviate plot . This plot transforms the standard ROC plots of sensitivity v.s. 1-specificity so that the familiar ROC bent curves look like straight lines. Operationally, we assume that there will be run specific variation in the assay characteristics when it is applied to future samples. The performance of the assay in the two gold standard sets provide a measure of this variability, which may resurface in subsequent use of the assay. When analyzing the data from the Santa Clara survey, we will acknowledge this variability by assuming that the assay performance (sensitivity/specificity) in this third application (red) interpolates the performance noted in the two gold standards (black).
Tumblr media
The shift-along the ROC model is specified by fitting separate specificities and sensitivities for the two gold standard datasets, which are then transformed into the binormal plot scale , averaged and then transformed back to normal probability scale to calculate an average sensitivity and specificity. For computational reasons, we parameterize the models for the sensitivity and specificity in logit space. This parameterization also allows us to explore the sensitivity of the model to the choice of the model priors for the Bayesian analysis.
Model for the observed positive rates
Tumblr media
The model for the observed positive counts is a binomial law , relating the positive tests in the Santa Clara sample to the total number of tests: Priors For the common test characteristic model, we will assume standard, uniform, non-informative priors for the Sensitivity, Specificity and Prevalence. The model may be fit with the following STAN program:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950functions {}data { int casesSantaClara;   // Cases in Santa Claraint NSantaClara;      // ppl sampled in Santa Clara int TPStanford;       // true positive in Stanfordint NPosStanford;     // Total number of positive cases in Stanford dataset int TNStanford;       // true negative in Stanfordint NNegStanford;     // Total number of negative cases in Stanford dataset int TPManufacturer;       // true positive in Manufacturer datasetint NPosManufacturer;     // Total number of positive cases in Manufacturer dataset int TNManufacturer;       // true negative in Manufacturer datasetint NNegManufacturer;     // Total number of negative cases in Manufacturer dataset }transformed data { }parameters {real<lower=0.0,upper=1.0> Sens;real<lower=0.0,upper=1.0> Spec;real<lower=0.0,upper=1.0> Prev; }transformed parameters {  }model { casesSantaClara~binomial(NSantaClara,Prev*Sens+(1-Prev)*(1-Spec));TPStanford~binomial(NPosStanford,Sens);TPManufacturer~binomial(NPosManufacturer,Sens);TNStanford~binomial(NNegStanford,Spec);TNManufacturer~binomial(NNegManufacturer,Spec); Prev~uniform(0.0,1.0);Spec~uniform(0.0,1.0);Spec~uniform(0.0,1.0);}generated quantities {real totposrate;real rateratio;totposrate = (Prev*Sens+(1-Prev)*(1-Spec));rateratio = Prev/totposrate;
For the shifted-along-ROC model, we explored two different settings of noninformative priors for the logit of sensitivity and specificity:
a logistic(0,1) prior for the logit of the sensitivities and specificities that is mathematically equivalent to a uniform prior for the sensitivity and specificity in the probability scale.
a normal(0,1.6) prior for the logits. While these two densities are very similar, the normal (black) has thinner tails, shrinking the probability estimates more than the logistic one (red) .
Tumblr media
Comparison of logistic(0,1) vs normal (0,1.6) prior densities
The shifted-ROC model is fit by the following STAN code (only the normal prior model is inlined below)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768functions {}data { int casesSantaClara;   // Cases in Santa Claraint NSantaClara;      // ppl sampled in Santa Clara int TPStanford;       // true positive in Stanfordint NPosStanford;     // Total number of positive cases in Stanford dataset int TNStanford;       // true negative in Stanfordint NNegStanford;     // Total number of negative cases in Stanford dataset int TPManufacturer;       // true positive in Manufacturer datasetint NPosManufacturer;     // Total number of positive cases in Manufacturer dataset int TNManufacturer;       // true negative in Manufacturer datasetint NNegManufacturer;     // Total number of negative cases in Manufacturer dataset }transformed data { }parameters {real<lower=0.0,upper=1.0> Prev;real logitSens[2];real logitSpec[2]; }transformed parameters {real<lower=0.0,upper=1.0> Sens;real<lower=0.0,upper=1.0> Spec; // Use the sensitivity and specificity by the mid point between the two kits // mid point is found in inv_Phi space of the graph relating Sensitivity to 1-Specificity// note that we break down the calculations involved in going from logit space to inv_Phi// space in the generated quantities sessions for clarity Sens=Phi(0.5*(inv_Phi(inv_logit(logitSens[1]))+inv_Phi(inv_logit(logitSens[2]))));Spec=Phi(0.5*(inv_Phi(inv_logit(logitSpec[1]))+inv_Phi(inv_logit(logitSpec[2]))));}model { casesSantaClara~binomial(NSantaClara,Prev*Sens+(1-Prev)*(1-Spec));TPStanford~binomial_logit(NPosStanford,logitSens[1]);TPManufacturer~binomial_logit(NPosManufacturer,logitSens[2]); TNStanford~binomial_logit(NNegStanford,logitSpec[1]);TNManufacturer~binomial_logit(NNegManufacturer,logitSpec[2]); logitSens~logistic(0.0,1.0);  logitSpec~logistic(0.0,1.0);   Prev~uniform(0.0,1.0);}generated quantities {// various post sampling estimatesreal totposrate;        // Total positive ratereal kitSens[2];        // Sensitivity of the kits   real kitSpec[2];        // Spec of the kitsreal rateratio;         // ratio of prevalence to total positive rate totposrate = (Prev*Sens+(1-Prev)*(1-Spec));kitSens=inv_logit(logitSens);kitSpec=inv_logit(logitSpec);rateratio = Prev/totposrate;}
Estimation of the Prevalence of COVID19 Seropositivity in Santa Clara County
Preprint author analyses and interpretation
Table 2 of the preprint summarizes the estimates of the prevalence of seropositivity in Santa Clara. This table is reproduced below for ease of reference
ApproachPoint Estimate95% Confidence IntervalUnadjusted1.501.11 – 1.97Population-adjusted2.812.24-4.16Population and test performance adjustedManufacturer’s data2.491.80 – 3.17Stanford data4.162.58 – 5.70Manufacturer + Stanford Data2.752.01 – 3.49
Estimated Prevalence of COVID19 Seropositivity Bendavid et al
The unadjusted estimate is computed on the basis of the observed positive counts (50/3,330 tests) and the population adjusted estimate is obtained by accounting for the non-representativeness of the sample using post-stratification weights. Finally, the last 3 rows give the estimates of the prevalence while accounting for the test characteristics. The basis of the comment that the prevalence of COVID19 infection is 50-85 fold higher is based on the estimated point prevalence of 2.49 and 4.16 respectively. Computation of these estimates, require the sensitivity and specificity for the immunoassay which the authors quote as:
Our estimates of sensitivity based on the manufacturer’s and locally tested data were 91.8% (using the lower estimate based on IgM, 95 CI 83.8-96.6%) and 67.6% (95 CI 50.2-82.0%), respectively. Similarly, our estimates of specificity are 99.5% (95 CI 98.1-99.9%) and 100% (95 CI 90.5-100%). A combination of both data sources provides us with a combined sensitivity of 80.3% (95 CI 72.1-87.0%) and a specificity of 99.5% (95 CI 98.3-99.9%).
The results obtained are crucially dependent upon the values of the sensitivity and specificity, something that the authors explicitly state in the discussion:
For example, if new estimates indicate test specificity to be less than 97.9%, our SARS-CoV-2 prevalence estimate would change from 2.8% toless than 1%, and the lower uncertainty bound of our estimate would include zero. On the other hand, lower sensitivity, which has been raised as a concern with point-of-care test kits, would imply that the population prevalence would be even higher. New information on test kit performance and population should be incorporated as more testing is done and we plan to revise our estimates accordingly.
While the authors claim they have accounted for these test characteristics, the method of adjustment is not spelled out in detail in the main text. Only by looking into the statistical appendix do we find the following disclaimer about the approximate formula used:
There is one important caveat to this formula: it only holds as long as (one minus) the specificity of the test is higher than the sample prevalence. If it is lower, all the observed positives in the sample could be due to false-positive test results, and we cannot exclude zero prevalence as a possibility. As long as the specificity is high relative to the sample prevalence, this expression allows us to recover population prevalence from sample prevalence, despite using a noisy test.
In other words, the authors are making use of an approximation, that relies on a assumption about the study results that may or may not be true. But if one is using such an approximation, then one has already decided what the results and the test characteristics should look like. We will not make such assumption in our own calculations.
Bayesian Analysis
The Bayesian analyses we report here roughly corresponds to a combination that Bendavid et al did not consider, namely an analysis using only the test characteristics, but without the post-stratification weights. Even though a Bayesian analysis that used such weights is possible, the authors do not report the number of positives and negatives in each strata, along with the relevant weights. If this information had been made available, then we could repeat the same calculations and see what we obtain. While, I can’t reproduce the exact analysis examination of Table 2 in Bendavid et al suggests that weighting should increase the prevalence by a “fudge” factor. For the purpose of this section we take the fudge factor to be equal to the ratio of the prevalence after population adjustment over the unadjusted prevalence: 2.81/1.50 ~ 1.87 . For fitting the Bayesian analyses, I used the No U Turn Sampler implemented in STAN; Isimulated five chains, using 5,000 warm-up iterations and 1,000 post-warm samples to compute summaries. Rhat for all parameters was <1.01 for all models (R and STAN code for all analyses are provided here to ensure reproducibility).santaclaraΛήψη
Sensitivities and specificities for the three Bayesian analyses: common-parameters, shifted-along-ROC (logistic prior), shifted-along-ROC (normal prior) are shown below:
SensitivityModelMeanMedian95% Credible IntervalCommon Parameters0.8360.8380.768 – 0.897Shifted-along-ROC (logistic)0.8110.8130.733 – 0.882Shifted-along-ROC (normal)0.8100.8130.734 – 0.878SpecificityCommon Parameters0.9930.9940.986 – 0.998Shifted-along-ROC (logistic)0.9910.9910.983 – 0.998Shifted-along-ROC (normal)0.9890.9880.982 – 0.995
Bayesian Analysis of Assay Characteristics
The three models seem to yield similar, yet not identical estimates of the assay characteristic and one may even expect that the estimated prevalences would be somewhat similar to the analyses by the preprint authors. The estimated prevalence by the three approaches (without application of the fudge factor) are shown below:
ModelMeanMedian95% Credible IntervalCommon Parameters1.031.050.16-1.88Shifted-along-ROC (logistic)0.810.810.50 – 1.84Shifted-along-ROC (normal)0.560.500.02 – 1.46
Model Estimated Prevalence (%)
Application of the fudge factors would increase the point estimates to 1.93%, 1.52% and 1.05% which more than 50% smaller than the prevalence estimates computed by Bendavid et al. To conclude our Bayesian analyses, we compared the marginal likelihoods of the three models by means of the bridge sampler. This computation allows us to roughly see which of the three model/prior combinations is supported by the data. This analysis provided overwhelming support for the Shifted-along-ROC model with the normal (shrinkage) prior:
ModelPosterior ProbabilityCommon Parameters0.032Shifted-along-ROC (logistic)0.030Shifted-along-ROC (normal)0.938
Conclusions
In this rather long post, I undertook a Bayesian analysis of the seroprevalence survey in Santa Clara county. Contrary to the author’s assertions, this formal Bayesian analysis suggests a much lower seroprevalence of COVID19 infections. The estimated fold increase is only 20 times higher (point estimate), rather than 50-80 fold higher and with a 95% credible interval 0.75 -55 fold for the most probable model.
The major driver of the prevalence, is the specificity of the assay (not the sensitivity), so that particular choices for this parameter will have a huge impact on the estimated prevalence of COVID19 infection based on seroprevalence data. Whereas the common parameter and the shifted-along-ROC models yield similar estimates for the same prior (the uniform in probability scale is equivalent to standard logistic in the logit scale), a minor change to the priors used by the shifted-along-ROC model, leads to results that are qualitatively and quantitatively different. The sensitivitity of the estimated prevalence to the prior, suggests that the assay performance data are not sufficient to determine either the sensitivity or the specificity with precision. Even the minor shrinkage implied by changing the prior from the standard logistic to the standard normal , provides a small, yet crucial protection against overfitting and leads to a model with extremely sharp marginal posterior likelihood.
Are the Bayesian analysis results reasonable? I still can’t trust the quality and the biases inherent in the source data, but at least the calculations were done lege artis . Interestingly enough, the estimated prevalence of COVID19 infection (1-2%) by the Bayesian methodology described here, is rather similar to the prevalence (<2%) reported in Telluride, Colorado.
The analysis reported herein could be made a lot more comparable to the analyses reported by Bendavid et al, if the authors had provided the strata data (positive and negative test results and sample weights). In the absence of this information, the use of fudge factors is the best equivalent approximation to the analyses reported in the preprint. Is the Santa Clara study useful for our understanding of the COVID19 epidemic?
I feel that the seroprevalence study in Santa Clara is not the “bombshell” study that I initially felt it would be (so apologies to all my Twitter followers whom I may have misled). There are many methodological issues starting with the study design and extending to the author’s analyses of test performance. Having spent hours troubleshooting the code (and writing the blog post!), my feelings align perfectly with Andrew Gelman’s:
I think the authors of the above-linked paper owe us all an apology. We wasted time and effort discussing this paper whose main selling point was some numbers that were essentially the product of a statistical error.
I’m serious about the apology. Everyone makes mistakes. I don’t think they authors need to apologize just because they screwed up. I think they need to apologize because these were avoidable screw-ups. They’re the kind of screw-ups that happen if you want to leap out with an exciting finding and you don’t look too carefully at what you might have done wrong.
COVID19 is a serious disease, which is far worse than the flu in terms of contagiousness, severity and lethality of the severe forms and level of medical intensive care support needed to avert a lethal outcome. Knowing the prevalence will provide one of the many missing pieces of the puzzle we are asked to solve. This piece of information will allow us to mount an effective public health response and reopen our society and economy without risking countless lives.
Disclaimer: I provided the data for my analyses with the hope that someone can correct if I am wrong and corroborate me if I am right. Open Science is the best defense against Bad science and making the code available is a key step in the process.
Christos Argyropoulos is a clinical nephrologist, amateur statistician and Division Chief of Nephrology at the University of New Mexico Health Sciences Center. This post originally appeared on his blog here.
The post Is the COVID-19 Antibody Seroprevalence in Santa Clara County really 50-85 fold higher than the number of confirmed cases? appeared first on The Health Care Blog.
Is the COVID-19 Antibody Seroprevalence in Santa Clara County really 50-85 fold higher than the number of confirmed cases? published first on https://venabeahan.tumblr.com
0 notes
binnedrubbish · 5 years ago
Text
5/12/19 Notes
Lab Meeting Prep Pipeline:
(May 2nd, 2019 at 2:38 p.m.) 
[ ] Read the Results & Discussion cover to cover
[ ] Complete slides for all figures
[ ] Give a practice presentation
[ ] Read methods 
[ ] Complete fluorescence slides
[ ] Decide how to deal with ‘relationship between calcium activity and movement’ section
[ ] Give a practice presentation 
[  ] Read supplementary material cover to cover
[  ] Give a practice presentation 
Note to self: Relax.  Be meticulous.   Be disciplined.  Keep calm, do your best, trust your team.  
—— 
——
Advanced Optimization 
8 20 905
Live Action Poem, February 2nd, 6:41
Went to Brazil out of spite and saw
stone Jesus, arms open for a hug,
bought street weed, twice, from the same vendor
out of a reckless love for reckless love.
Hoped for a tropical muse and found 
a strong handshake from a dangerous man.
Holed up in Rio de Janeiro with piles
of paper money and paced all alone
angry at nothing if only for the moment.
Rain dampened slick stone walkaways,
waiters were too nice and I tipped too much.
One offered to be a bodyguard , violence
hinted in every smirking human moment.
God, I loved being a target, smug,
dumb, flitting away American Dollars.
Jesus Christ looming in stone on a hill top.
Titties and marijuana, iconic primadonna 
extravagant flora, dying fauna, fawning
over the climate. I went to Brazil
on an off month. To hole up 
safe from my sprawling little lovely life. 
To Do 26.1.19
[x] Cristina - Search for Hippocampus Models
[x] Ana G. - Draft e-mail call for interest in “Live Action Science”
[  ] 
Data science Club Thursday at 5:00 p.m. 
Astavakrasana 
laser-scanning photostimulation (LSPS) by UV glutamate uncaging. 
12.1.19 Goals
[x] Some Portuguese 
[x] Mouse Academy - first read 
[ / ] Dynamic mesolithic dopamine 
[x] Water rats * SMH
Acorn - tracks impact | BetaWorks | 2 years of money | PitchBook | Social Impact Start Up 
Mission Aligned Investors | Metrics | Costumer Acquistion Cost | Clint Corver -> Chain of Contacts -> Who To Talk to (Scope: ~100) 
Money Committed || Sparrow || Decision Analysis —> Ulu Ventures [500k] [Budget x ] 
Ivan - > IoS Engineering { Bulgarian DevShop } 
[market mapping] Metrics -> Shrug 
Peter Singer - Academic Advisory Board … 
[1 million ]
Product market testing 
Foundation Directory Online  - Targeted , Do Your Homework 
https://www.simonsfoundation.org/2018/11/19/why-neuroscience-needs-data-scientists/
Head-fixed —> 
~INHIBITION EXPERIMENT TRAINING PLAN~
STOP MICE:  20th.  GIVE WATER: 20th (afternoon) - 30th.  DEPRIVE: 31st... (Morning) RESUME: Jan 2nd.
21st - BLEACH/DEEP CLEAN BOXES 1-14 (Diluted bleach; Flush (with needles out) - Open Arduino Sketch with Continuously open Valves - PERFUSE System) *[NOT BOX 11 or 5]*; Run 15 mL of Bleach per syringe; Copious water through valves; Leave dry.
———
http://www.jneurosci.org/content/preparing-manuscript#journalclub
Friday - Dec. 14th, 2018 
[x] - Complete 2019 ‘Goals and Blueprint’ 
[x] - 2-minute Summary ‘Properties of Neuron in External Globus Pallidus Can Support Optimal Action Selection 
[  ] MatLab for Neuroscientists :: Basic Bayesian Bearded Terrorist probability plots 
[x] Statistics 101: Linear Regression 
“Golden Girls” - Devendra Banhart
“King” by Moor - FIREBEAT 
Reread - Section 3.3 to  
Monday - Apply for DGAV License (MAKE SHORT CV)
SAMPLE: ‘Sal’ From Khan Academy 
Make short CV
Tiago - Certificate 
MATH:
“We explicitly focus on a gentle introduction here, as it serves our purposes. If you are in 
need of a more rigorous or comprehensive treatment, we refer you to Mathematics for Neuroscientists by Gabbiani and Cox. If you want to see what math education could be like, centered on great explanations that build intuition, we recommend Math, Better Explained by Kalid Azad.”
Jacksonian March seizure (somatosensory) 
Tara LeGates > D1/D2 Synapses
Scott Thompson
Fabrizio Gabbiani - Biophysics - Sophisticated and reasonable approach 
Quote For Neuroscience Paper:
“Every moment happens twice: inside and outside, and they are two different histories.”
— Zadie Smith, White Teeth  
Model Animal: Dragonfly? Cats. Alligators. 
Ali Farke Toure 
Entre as 9 hora e o meio-dia ele trabalha no computador. 
Ele volta para  o trabalha à uma e meia.  
Ele vai as compras depois do trabalho.
A noite, depois do jantar, ele e a mulher veem televisão.
As oito vou de bicicleta para o trabalho.  (go)
As oito venho de bicicleta para o trabalho.  (come) 
A que horas começa a trabalhar?
Eu começo a trabalhar os oito e meia.
Normalmente… 
Eu caminho cerca de Lisbon.
É muito triste! Eu faço nada! Talvez, eu caminho cerca de Lisbon.  Talvez eu leio um livro.  Talvez eu dormi.    Eu vai Lx Factory.  
Depois de/do (after) 
antes de/do (before) 
Monday -> Mice 
MATLAB!
-
“New ways of thinking about familiar problems.” 
~*NOVEMBER GOALS*~ 
> Permanent MatLab Access [x] -> Tiago has license 
> Order Mouse Lines [ ] -> Health report requested… Reach out to Vivarium about FoxP2 
   -> Mash1 line -> FoxP2 expression?  
> Finish ‘First Read Through’ [ ] 
> Figure 40 [ ]
SAMPLE : ‘Afraid of Us’ Jonwayne, Zeroh 
Monday Nov 5th Goals: 
> Attentively watch:
> https://www.youtube.com/watch?v=ba_l8IKoMvU (Distributed RL)
> https://www.youtube.com/watch?v=bsuvM1jO-4w (Distributed RL | The Algorithm) 
MatLab License 
Practical Sessions at the CCU for the Unknown between 19 - 22 Nov 2018 (provisional programme attached)
Week of November 5th - Handle Bruno’s Animals 
Lab Goals - 
“Deep Networks - Influence Politics Around the World”
Paton Lab Meeting Archives
Strategy: Read titles/abstracts follow gut on interesting and relevant papers
Goals: Get a general sense of the intellectual history of the lab, thought/project trajectories, researchers and work done in the field and neighboring fields.
Look through a GPe/Arkypallidal lens… what can be revisited with new understanding?
First Read Through 
[x] 2011 - (22 meetings || 10/12 - SLAM camera tracking techniques)  
[ x] 2012a (18 meetings) 
 [x] 2012b (15 meetings - sloppy summary sentences)
[ x] 2013a (19 meetings - less sloppy summaries jotted down)
[x] 2013b (17 meetings) 
[x] 2014a (21 meetings) (summaries in progress)
[x] 2014b 
[x] 2015 (23 meetings)
[ ] 2016 (23 meetings) 
Current 
“I like, I wish, I wonder”
“Only Yesterday” Pretty Lights
retrosplenial dysgranular cx (?)
retrosplenial granular cx, c (?)
fornix (?)
Stringer 2018 arVix
Lowe and Glimpsher 
November Goals:
[  ] GPe literature - 
[ x ] Dodson & Magill
[  x] Mastro & Gittis
[  ] Chu & Bevan 
[x] Modeling (extra credit -Bogacz)
[  ] Principles of Neural Science: Part IV
[ x ] MatLab license… Website program… 
Extra credit:
Side projects [/ ] Neuroanatomy 40
[ -> ] ExperiMentor - Riberio, Mainen scripts… Paton! -> LiveAction Science
MACHINE LEARNING 
Week of Oct 29th - 
Symposium Week!
Wyatt -> John Hopkins -> He got into American University! 
Belly Full Beat (MadLib album Drive In) 
“The human brain produces in 30 seconds as much data as the Hubble Space Telescope has produced in its lifetime.” 
Sequence of voltage sensors -> ArcLite -> Quasar -> Asap -> Voltron -> ???
Muscarine -> Glutamate 
Ph Sensitive 
cAMP
Zinc sensitive 
5 ways to calculate delta f
2 main ways 
SNR Voltage — 
Dimensionality reduction of a data set: When is it spiking?
5 to 10 2-photon microscope open crystal 
…Open window to a million neuron…
Week of 10/15/18
Monday: Travel
Tuesday: Rest
Wednesday: Begin rat training.  Reorient.
Thursday:
Friday:
|| Software synergistically ||
—————
Beam splitter, Lambda, diacritic 
1.6021766208×10−19
‘sparse coding’
Benny Boy get your programming shit together. 
Week of Oct. 8th, 2018
10/9/18
[  ] Rat shadowing (9:30 a.m.) -> Pushed to next week 
10/8/18
[x] Begin Chapter 13 of Kandel, Schwartz, Jessell
[x] Outline of figure 36
[  ] Read Abdi & Mallet (2015) 
DOPE BEAT MATERIAL - Etude 1 (Nico Muhly, Nadia Sirota) 
Saturday - Chill [x]
Friday - ExperiMentor … mehhhhh scripts?  
Photometry -> Photodiode collects light in form of voltage (GCaMP) (TtdTomate as Baseline… how much fluorescence is based on TdTomatoe, controlling factor always luminesce - GCaMP calcium dependent) :: Collecting from a ‘cone’ or geometric region in the brain.  Data stored and plotted over time… Signals must be corrected… 
Cell populations are firing or releasing calcium.  (GCaMP encoded by virus injection, mice express CRE in a particular cell type).  
———————————————
———————————————
Brain on an Occam’s Razor,
bird on a wire, 
synaptic fatalism integrating 
consistent spiking;
strange looping: is this me? 
Thursday 
“We don’t make decisions, so much as our decisions make us.”
“Blind flies don’t like to fly”
[x] 9:00 a.m. Lab Meeting
[x] 12:00 p.m. - Colloquium
“It was demeaning, to borrow a line from the poet A. R. Ammons, to allow one’s Weltanschauung to be noticeably wobbled.”
“You must not fear, hold back, count or be a miser with your thoughts and feelings. It is also true that creation comes from an overflow, so you have to learn to intake, to imbibe, to nourish yourself and not be afraid of fullness. The fullness is like a tidal wave which then carries you, sweeps you into experience and into writing. Permit yourself to flow and overflow, allow for the rise in temperature, all the expansions and intensifications. Something is always born of excess: great art was born of great terrors, great loneliness, great inhibitions, instabilities, and it always balances them. If it seems to you that I move in a world of certitudes, you, par contre, must benefit from the great privilege of youth, which is that you move in a world of mysteries. But both must be ruled by faith.”
Anaïs Nin
[  ] MatLab trial expires in 1 day * 
[  ] 3:00 p.m. pictures
“We do not yet know whether Arkys relay Stop decisions from elsewhere, or are actively involved in forming those decisions. This is in part because the input pathways to Arkys remain to be determined.”
These studies prompt an interesting reflection about the benefits and conflicts of labeling and classifying neurons at a relatively grainy level of understanding.  
“The authors hypothesize that under normal conditions, hLTP serves an adaptive, homeostatic role to maintain a healthy balance between the hyperdirect and indirect pathway in the STN. However, after dopamine depletion, pathologically elevated cortical input to the STN triggers excessive induction of hLTP at GPe synapses, which becomes maladaptive to circuit function and contributes to or even exacerbates pathological oscillations.”
To Do Week of Oct. 1st - Focus: Big Picture Goals
[ x ] GPe Literature - Hernandez 2015 & Mallet 2016 (Focus on techniques and details)
[  ] MatLab! Lectures 6-7 (Get your hands dirty!)
[ x ] Kandel Chapters 12 - 13 
Tuesday Surgery Induction 10:00 with Andreia 
6:00 - 7:30 
Portuguese
Digitally reconstructed Neurons: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5106405/
To Do Week of, September 24th, 2018 
To Do Week of  Monday, September 17th, 2018
PRIORITY: 
DATA ANALYSIS PROJECT ITI 
———— PAUSE. ———————
Talks 
[x ] Mainen Lab - Evidence or Value based encoding of World State/Probability - ‘Consecutive failures’ - easy/medium/hard estimate of where the reward will be.  
Reading for the Week
[x] Chapter 9 - Propagating Signal | The Action Potential
[/ ] Ligaya et. al (2018)  (CCU S.I.?)
[x] Katz & Castillo (1952) Experiment where they describe measurement techniques
[  ] Raiser Chapter 4 - Stimulus Outlasting Calcium Dynamics in Drosophila Kenyon Cells Encode Odor Identity 
Video Lectures
[—  ] Linear Algebra (Trudge steadily through) 
[ — ] Khan Academy Logarithms (Trudge steadily through) 
MatLab
[  ] Trudge steadily through www.mathworks.com/help/matlab/learn_matlab 
*FIND PROBLEM SET/TEXT BOOK/WORK SHEETS*
Concepts to Grasp
[ / ] Master logarithms!
[  ] Review Kandel Et. Al  Part II *Chapters 5-9*
Neuroanatomy
[ x ]  Ink Figure 28
Project Planning?  Too soon! Too soon! Read some literature on the subject.  
17/9/18
1:00 p.m. Meet with Catarina to discuss “CCU Science Illustrated” (WIP) Project
2:30 p.m. Vivarium Induction 
_______________________________________________________
|      SPCAL Credentials     |
|   |
| login: |
| PW:   |
-————————————————————————
——
NPR:: https://www.npr.org/sections/health-shots/2018/09/11/644992109/can-a-barn-owl-s-brain-explain-why-kids-with-adhd-can-t-stay-focused
9.13.18
[ x ] Pauses in cholinergic interneuron firing exert an inhibitory control on stratal output in vivo (Zucca et. al  2018)
[ x ] Chapter 8 - Local Signaling: Passive Properties of 
-> Sub and supra threshold membrane potential (Conceptual) 
Monday, Sept. 10th 2018
“Eat the Frog First”
[ N/A ] Review SPCAL Lessons 1-5 (In Library?) CRAM THURSDAY? 
-> [/] wait for confirmation from Delores for theoretical test 
-> (Out of Office reply from person in charge)
To Do:
[/] Comment Out %PRE_PROCESS_vBeta.m 
[x] Change path name and run program in MatLab
[  ] Solve trial.blahblahblah error spkCount?  labels?
[  ] Change Epochs and run? 
[x] Chapter 7 - Membrane Potential :: Return to Pg. 136-137 Box 7-2 when sharp. ::
[x] Castillo and B. Katz (1954) 
[x] 12:00 - Neural Circuits for Vision in Action CCU
[x] 2:30 - THESIS DEFENSE: Mechanisms of Visual Perceptions in the Mouse Visual Cortex 
————
Extra-credit
[x] Ink Figure 24
[~ ] Finish “First & Last 2017” (100/127 = 78.74%)
——
Jax Laboratory Tools: https://www.jax.org/jax-mice-and-services/model-generation-services/crispr-cas9
Recommendation for Design and Analysis of In Vivo Electrophysiology Studies 
http://www.jneurosci.org/content/38/26/5837
On the Horizon: 
Schultz (1997) (Classic, classic, classic) 
*[x] 9/7/18 - 6:00 p.m. Flip water for Bruno’s mice *
ITI Data Analysis -> Next step ->…. 
[  ] (find the sigmoid call) /  Poke around preprocessing_beta 
Reading 
[x] Chapter 6 - Ion Channels
[ / ] Finish Krietzer 2016 —> [  ] write an experiment-by-experiment summary paper
Resource: https://www.youtube.com/watch?v=GPsCVKhNvlA Helpful explanation of ChR2-YFP, NpHR, and general ontogenetic principles.
[ / ] Reiser Chapter 3.3.38 - 3.4 (Need to finish 3.4.5, Look up Photoionization detectors, Coherence) 
Neuroanatomy
[/]  Finish Figure 24 (need to ink)
“Drawing Scientists “
[/] Storyboard for GCAMP6s targeted paper 
-> Show Filipe for feedback ->
-> Ask Leopold permission ? Talk to Catarina 
[  x] 16:9 
[x] Write script and record [ 1:00 ] 
Intellectual Roaming
[ / ] Return to Review of Reviews and Review Zoom-In | First & Last | 
[/] Explore Digital Mouse Brain Atlas 
9/6/18 - Thursday 
To Do: 
ITI Data Analysis :
[x] Draw data structure on mm paper -> Reach out for help understanding 
[ / ] What fields did Asma call?  What fields are necessary for a psychometric curve
Reading 
[x] Kandel - Chapter 5 | Synthesis and Trafficking of Neuronal Proteins 
[ / ] Reiser - Chapter 3 | A High-Bandwidth Dual-Channel Olfactory Stimulator for Studying Temporal Sensitivity of Olfactory Processing (Results complicated) 
[/ ] Krietzer 2016 - Cell-Type-Specific Controls of Brainstem Locomotor Circuits by Basal Ganglia
Talks:
[x] 12:00 p.m.  - Colloquium - Development of Drosophila Motor Circuit 
Tutorials: 
~ [x ] MatLab plotting psychometric curves 
Neuroanatomy 
[ x ] Outline brain for figure 24
———
MatLab
Laser stuff HZ noise, thresholds, 
// PCA -> Co-variance -> 
// Linear regression | Geometric intuition -> “What is known to the animal during inter-trial?  What features can be described by animals history”  ===> Construct a history space (axis represent different animals history ex. x-axis previous stimulus, reward, etc.?)  Predictive (?)  
Plot psychometric functions || PSTH (post stimulation of histogram )  of example neurons -> skills: bin spiking, plot rasters, smoothing (if necessary) 
Data:: Access to Dropbox -> /data/TAFC/Combined02/ [3 animals :: Elife] 
/data/TAFC/video
Tiago and Flipe know the video data
File Format -> Parser/Transformation (guideline) || 
> MatLab
Access to MatLab -> [/] 28 days!
How can I begin to analysis?
History dependent | Omitted 
——
To Do Week of September 3rd
Monday
Administrative
[ x] Check-in with HR (Don’t bombard!): Badge.   (Library access?) 
[  ] Reach out to SEF?
[x] 2:00 p.m. Meet with Asma - discuss data analysis.  Where is it?  How do I access it (Tiago?)  What has been done and why?
[x] 3:00 p.m. Lab Meeting “Maurico’s Data” - Pay special attention 
[x] Finish first read through of Theoretical Laboratory Animal Science PDF Lectures
[  ] Rat Surgery Techniques…
Mouse neuroanatomy project
[/ ] Figure 24
[  ] Figure 28
Math 
[x ] L.A. Lecture 2
[ x] L.A. Lecture 3 
Read:
[  ] Georg Raiser’s Thesis (Page 22 of 213)
Find time to do at least an hour of quiet focused reading a day.  (Place?).
Continue to explore whims, papers, databases, ideas, protocols, that seem interesting. 
Develop ‘literature scour’ protocol - (Nature Neuroscience, Neuron, Journal of Neuroscience) 
Dates to Remember: September 14th - Laboratory Animal Sciences Theoretical Test! 
https://www.sciencedaily.com/releases/2018/08/180827180803.htm:Can these be used for techniques?  
https://www.sciencedaily.com/releases/2018/08/180823141038.htm ‘Unexpected’ - Unexpected physical event and unexpected reward or lack of reward (neuronal modeling of external environment) 
In my first ten minutes at work I’m exposed to a weeks (month/year/decade) worth of interesting information.  Going from an intellectual tundra to an intellectual rain forest.  
1460 proteins with increased expression in the brain: Human Protein Atlas https://www.proteinatlas.org
Non-profit plasmid repository: https://www.addgene.org 
Protein database: https://www.rcsb.org/3d-view/3WLC/1
Started to think at the molecular level.   
“MGSHHHHHHGMASMTGGQQMGRDLYDDDDKDLATMVDSSRRKWNKTGHAVRAIGRLSSLENVYIKADKQKNGIKANFKIR
HNIEDGGVQLAYHYQQNTPIGDGPVLLPDNHYLSVQSKLSKDPNEKRDHMVLLEFVTAAGITLGMDELYKGGTGGSMVSK
GEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDF
FKSAMPEGYIQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNLPDQLTEEQIAEFKEAFSL
FDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGDGTIDFPEFLTMMARKGSYRDTEEEIREAFGVFDKDGNG
YISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK” - CCaMP6m amino acid code. 
  8/31/18 - (Friday) @12:00 in Meeting Room 25.08
GET USB ! ! 
[Lisboa Cultura na ru, Lisbon on the streets Com’Out Lisbon - Katie Gurrerirra ]
MatLab -> Chronux Neural Analysis 
SEPTEMBER 14th!
Week of August 27th, 2018
“Conserved computational circuitry, perhaps taking different arguments on different locations of Basil Ganglia” - Tuesday 
Andrew Barto: http://www-all.cs.umass.edu/~barto/
Basil Ganglia Labs
Okihide Hikosaka Lab: https://irp.nih.gov/pi/okihide-hikosaka
Wilbrecht Lab
Uchida N.  (ubiquitous dopamine motivation and reward) 
Peter J. Magill
Schultz (Pioneer in the field)
C. Savio Chan 
Doya, K. (theory) 
Calabresi, P. (muscarinic) 
Ana Graybiel (McGovern) 
James C. Houk (1994 - Book on Models of Computation in the basal Ganglia)
Evolutionary Conservation of Basil Ganglia type action-selection mechanisms: 
https://www.sciencedirect.com/science/article/pii/S0960982211005288
Dopamine D1 - Retinal Signaling https://www.physiology.org/doi/full/10.1152/jn.00855.2017 [Note to self: Too Off Track]
[ ~ ] Flurorphore Library
Official Badge? [  ] Printer Access [  ]?
Online Course on Laboratory Animal Science 
Monday  : 11 [x] 12 [x] 
Tuesday : 13 [x] 14 [x] 
Wednesday: 15 [x] 16 [/] 
Thursday: 17 [x] 18 [x]
Friday: 19 [x] 20  [/] 
Lesson 11 - Behavior and Environment, animals must be housed in an environment enriched to maximize their welfare. 
Lesson 12 - Rodent and Lagomorph Accommodation and Housing - A more comprehensive guide from the macro environment, facilities i.e. establishments, to the micro environments.  Covers health and safety procedures for personnel as well as geometry of housing units (rounded edges to prevent water accumulation).  Absolutely essential.  
Lesson 13 - Collecting Samples and Administrating Procedures - covers the most common collection techniques and materials collected and stressed the importance of doing as little harm as possible to the animal.  
Lesson 14 - Transporting the Animal : Shipper holds most of the responsibility.  Major goals are making sure the journey is as stress free as possible, contingency plans are in place, and that all of the logistics have been carefully planned, communicated, and coordinated between various parties responsible in the shipping.  Also, animals should be prepared mentally and physically for the journey and should have a period of post-transportation to adjust to the new surroundings and environment.  A number of practical issues must be considered such as temperature, availability of food, and access to animals during the journey.  Boxes should be properly labelled in whatever languages are necessary. 
Lesson 15 - The purpose of feeding and nutrition is to meet the energy needs of the animals, which vary by species, physiological state of animal (growth, maintenance, gestation, and lactation).  A number of category of diets exist as well as a variety of specific diets to best fits the needs of the experiment.  This chapter covers particulars of nutrition requirements and stresses the importance of avoiding obesity and malnutrition.  
Lesson 16 - Anatomy and Physiology of Teleosts (Skip for now: Focus on Rodents and Lagomorphs)
Lesson 17 - Anatomy and Physiology of Rodents and Lagomorphs - General characteristics of the anatomy and physiology of six species, 5 rodents and 1 lagomorph.  Mice, rats, guinea pigs, gerbils, and hamsters.  Rabbits.  It covers particularities of each species and has a quiz asking specific facts, mostly centered on commonalities and distinguishing factors.  Worth a close read.  
Lesson 18 - Anaesthesia and Analgesia in Rodents and Lagomorphs . Pre anaesthesia techniques, drug combinations, and repeated warning of the importance of choosing the right drugs and technique for the species.  Use of a chamber.  Methods of anesthesia (IP, IV, Volatile).  Endotracheal Intubation for rabbits; the proper use and administration of analgesics; monitoring during the operation (for example - the paw pain reflex disappears in medium to deep anesthesia 
Lesson 19 - Animal Welfare and Signs of Disturbance - This chapter repeatedly stresses the importance of the relationship between the caretaker and the animal.  It repeats the ideal social, environmental, and nutritional environments for rodents and rabbits and highlights peculiarities of each species.   After reading this one should be better suited to detecting stress, disease, or other ailments in a laboratory animal.  
Lesson 20 - Fish Psychology and Welfare (Skip for now: Focus on Rodents and Lagomorphs) 
Lessons 5, 17, and 20 pertain to fish 
TEST SEPTEMBER 14th 
MIT Open Course Ware:
Linear Algebra 
Lecture 2 [/ ] -> Elimination by Matrices, production of elementary matrices, basic computations, and a review of row and column approaches to systems of equations.  Introduction to the basic application of the rule of association in linear algebra.  
Lecture 3 [ ]
Mouse Neuroanatomy 
Ink Figure 16 [x]
Figure 20 [x]
Figure 24 [  ]
Introduction to MatLab:  https://www.youtube.com/watch?v=T_ekAD7U-wU [  ] 
Math Big Picture: Review Single Variable Calculus!  Find reasonable Statistics and Probability Course (Statistical Thinking and Data Analysis?  Introduction to Probability and Statistics?) Mine as well review algebra well I’m at it eh.  
Breathe in.  Breathe out.  
Data analysis :: Behavioral Analysis 
Ana Margarida - Lecture 6 - Handling Mice techniques 
EuroCircuit can make a piece.  Commercial v. DYI version of products.  
Dario is the soldering, hardware expert.  I.E. skilled technician. 
www.dgv.min-agricultura.pt; it is recommended that the entry on Animal Protection and the section on Animals used for experimental purposes be consulted first. 
Sir Ronald Fisher, stated in 1938 in regards to this matter that “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of”. 
——
Finally, it is time to publish and reveal the results. According to Santiago Ramón y Cajal, scientific writers should govern themselves by the following rules: 
Make sure you have something to say; Find a suitable title and sequence to present your ideas; Say it; Stop once it is said. 
8/21 Goals
Access ->
:: Champalimaud Private Internet [HR]  Printer [HR]
:: Web of Science (?)
:: PubMed (Nature, Journals, etc.?) 
:: 
———
PRIORITY:  Online Course -> Animal Laboratory Sciences PDF’s 
20 total -> 4 a day || I can finish by Friday 
Monday  : 1 [x] 2 [x] 
Tuesday : 3 [x] 4 [x ] 
Wednesday: 5 [x*] 6 [x] 
Thursday: 7 [x* ] 8 [x]es
Friday: 9 [x ] 10 [x ] 
Notes:
Lesson 1 - Philosophical and ethical background and the 3 R’s
Lesson 2 - Euthanasia.   Recommended, adequate, unacceptable.  Physical or chemical.  Chemical - inhalable or injectable.   Paton Lab uses CO2 and cervical dislocation.   
Lecture 3 - Experimental Design.  Return to as a starting point for basic design (randomized samples and blocks) Integrate with “Statistical Thinking and Data Analysis”
Lecture 4 - Legislation.  Memorize specific laws and acts.
Lecture 5 is highly specific for the care and maintenance of Zebrafish
Lecture 6 - Handling of rodents and mice.  A theoretical overview, this material is essentially kinesthetic.  
Lecture 7 - Provides a technically detailed account of how genetic manipulations are done and propagated.   Deserves a ‘printed’ review and vocabulary cross reference.
Lecture 8 - Health and Safety.  Predominantly common sense.   
Lecture 9 - Microbiology - contains an appendix with list of common infections that will be eventually be good to know.
Lesson 10 - Anaesthesia pre and post operation techniques, risks of infections etc. 
// http://ec.europa.eu/environment/chemicals/lab_animals/member_states_stats_reports_en.htm
http://ec.europa.eu/environment/chemicals/lab_animals/news_en.htm -> General European News regarding 
http://www.ahwla.org.uk/site/tutorials/RP/RP01-Title.html -> Recognizing pain in animals 
Week of 8/20/18 To Do:
Tiago/Team -> Whats the most important priority?
Get Arduino Machine working again [?]
Jupiter/Python Notebook Up [ ]
Bruno MatLab Access [… ]
 - Get documents to HR
 - Animal Lab certified?
 - Logistical/Certificate/Etc.  
  - Start discussing personal project: 
    >  (Rat colony) Wet Lab
    > (Machine Learning) Electric Lab
    > Statistics project
  - Reacquaint with Lab Technology/Protocols 
  - Review papers - Engage back with the science 
  - 
Project Print: Screen shots
[  ] collect 
“Do the job.  Do it engaged.   Engage -> Not just execute the best you can, understand the experiment.
Why? Alternative designs?  Control experiments needed to interpret the data?  Positive controls and negative controls?  What do you need to do to get crisp.  Totally engage.  
How it fits into other experiments?  
“Engage with the science as if it were your baby.”
Execute beautifully… Ask --- et. al.  What does ideal execution look like 
Extra time: allocate time.  Technicians : Freedom to do other things, work with other things, other technical things, giving people independent project to carry out.    Project --- has in mind?  Design.   Hands on education of how science works then reading.   Spend time focused on a problem and in the ideal become the world’s foremost expert on whatever ‘mundane’ aspect of what ever problem you are working on.
Computational in the context of a problem.  Learn to use.   Defining “problems I want to solve.”   As an operating scientist, the technology can change very quickly.   Capable of learning, understanding, and applying.  
Answer questions in a robust way.  Thinking of technology in context of problem.   Deep domain knowledge; focus on experimental more than book reading.   
Realistic path -> Research fellow to PhD. program.  Industry…  Strong head’s up to do research.   First-rate OHSU?  Excellent.    IF: Remember that it is narrow, broader with neuroscience as a component.   Biology < > Neurology.   Real neuroscience computational ->
Juxtasuposed: Engineering, CS, A.I., and all that…
Label in broad ways: Molecular, cellular, systems, cognitive, psychology.   Borders are so fuzzy — as to be 
Domain bias.   In general -> other than P.I. protected from funding.  Publication, the life of the business.   Metric of success is the science they publish.    Work that contributes to being an author = more engaged, more independent.   Evolved to an independent project.    
So incredibly broad -> CRISPR, GFP, Optogenetics, with higher level systems problems.   100 years = absurd.   Look back -> Could we have conceived whats going on today.  
Foremost expert on something how-ever limited.  Grow from there.   Grown from a particular expertise.    
Molecular biologist || Do what a 3 year old is taught to do.  How?  How?  How?  How does that work.  Quantum physics.   Ask questions.  Be open.   
Go to seminars  -> Go to every talk.  Take every note.  Primary literature fundamentally different.   Always learn in context.  Don’t dilute too much (ignore title, abstract, discussion).  Look at figures and tables and derive for yourself what they say.   Look for THE FIGURE or THE TABLE that is the crux and look for the control experiment.    Understand the critical assessment, are the facts valid and warranted?  Infinite amount to learn, don’t spread yourself infinitely thin.  “ 
 To Do: Develop Independent Machine Learning Project 
Gain Access to Web of Science 
————
Paton Learning Lab
Personal Learning Goals 
September 1st - December 1st 
Major Goals 
[  ] Read Principles of Neuroscience 5th Edition
[  ] Complete CSS 229 
[  ] Deep read 12 papers (Write summary || Practice peer review)
Administrative
[  ] Reactivate 
[ / ] Figure out Residence Permit/Visa
Lifestyle
[ x ] Purchase commuter bicycle
[ / ] Purchase waterproof computer/messenger bag
Language
[x] …. Focused practice minimum 20 minutes daily …? 
[  ]   Find language partner 
[  ] Portuguese film/television/music 
UPCOMING
Phone conversation with --------
Tuesday, August 7th 9:00 a.m. EST (10:00 a.m. 
0 notes
betterbemeta · 8 years ago
Note
I remember reading and enjoying HP and the methods of rationality when it was coming out, but I never did finish it. Penny for your thoughts about it? I thought it was a cool take, though it did take away fundamental elements of the original HP characters. But again, I was reading it like 5 years ago and haven't touched it since and a lot of my views and tastes have changed since then
OK so the thing is that you need to know about the guy who wrote it. This isn't a callout take like “he’s the Worst” or anything. But Eliezer Yudkowsky’s context seriously changes a lot of what he writes about. He’s entirely untrained in his fields of interest-- he’s a homeschooled dude with no college degree, for example. And yeah, nobody has go to to university but, he positions himself as an expert and his AI research as research. No matter how much mathematics or science jargon he uses, he’s not... a scientist. He is as credible on his science as I am with the exception that his opinion blog on it is science-focused and very popular and that he has a nominally science-related nonprofit organization. He has few self-published papers. Despite claiming to be a computer programmer he has very little published code. And only two JSTOR cites-- one of which was by a personal friend.
He’s a transhumanist with a nonprofit organization (that doesn’t actually produce much peer-reviewed papers), a blog etc. He believes that despite the extremely slow pace of real AI research, that the Singularity is coming, and he overemphasizes Bayesian probability over even the scientific method he models in that story a little bit.
So this kind of alters a lot of the meat of his fanfic. When I read it the first time at age 15 or so (?) I admittedly enjoyed Harry Potter and the Methods of Rationality! But then I thought more critically about it. Would a child steeped in wizard racism really be discouraged by basically punnet squares? How ‘rational’ is this Harry, really? Was that part with his patronus being a human to express his enlightenment thought-provoking or really patronizing? Was the tone revealing or thoughtful or just disrespectful of the intelligence of all of the other characters in the original work?
Don’t get me wrong-- I love the concept of a skeptic Harry approaching the magical world with reason or science. But upon a second read while knowing more about the writer, it seems to be the extent of Yudkowsky’s credibility. And if Harry in the story expresses that dude’s ideals, it doesn’t frame him as a particularly thoughtful person.And Yudkowsky has a legion of fans that do behave as if that fanfic does model his ideals. It’s a different situation than, say, when a writer portrays abuse or violence in their materials, we can assume they are telling a story and not endorsing it. Because they don’t often have large nonprofit organizations that exist to suggest the perspective in their stories is essential for the future of humanity.
16 notes · View notes
awildpoliticalnerd · 5 years ago
Text
I had a lot of fun puzzling over this question. I love cooking and I saw the show that this book spun-off on Netflix. To me, these elements don't just make food tasty-- they're also integral to the whole pursuit of cooking.
Like, we use these tastes and processes to do creative new things to food. They're what experts wield to bring the most out of something, they're things that invite creative new dishes--but they're also already at the core of cooking as is. Baked in, as it were.
So I wanted to come up with 4 things that were like that for statistics and data science--at least as I understand them. The 4 I came up with were: Visualization, Simulation Methods, Gradient Descent, & Wholistic (often qualitative) subject matter expertise.
Visualization isn't just to make data pretty for end users--although that's often icing on the cake. They also allow US to understand what we're seeing in the data. They're both means and ends of new understanding.
Gradient descent underlies so much of the tools we use. Using an MLE method? It probably uses gradient descent to estimate the likelihood function. And odds are you're using an MLE method but don't realize it.
Simulation methods similarly underly so much of what we do. Everything from Bayesian methods to bootstrapping and a lot more. And more stuff is coming out all the time. (And I know my knowledge is pretty superficial here).
Last but not least, deep (often qualitative) subject knowledge. We often don't realize the assumptions about the world we lean on when we interpret results, but they're there. And they can't all be appreciated by quantitative inference alone.
One of my most formative quant experiences was a class on quantitative issues in studying race and ethnicity. I realized how much we take for granted the ideas we proxy with numbers. How solid they seem. Yet how contingent they really are. Like cooking over flame, there is life there, vibrant and fluid. It is humbling to work with and needs to be respected. And you can really only do that if you appreciate the broader context.
Tumblr media
0 notes
eurekakinginc · 6 years ago
Photo
Tumblr media
"[D] Best resources on Bayesian neural networks"- Detail: I am currently interested in learning more about quantifying uncertainty in deep learning, primarily by using Bayesian methods, as there has been a lot of promising research published lately (my background is from statistics, so might be a bit biased here).I have already had a look at papers and theses by Y. Gal, C. Blundell, A. Kendall, R. Neal, and some more, and think I have a basic understanding of ideas such as Bayes by backprop, and Dropout as Bayesian Approximation. It's really exciting to see how existing methods and best practices turn out to be justified by Bayesian reasoning. Of course some of it still feels a bit ad hoc, and the really interesting thing I think would be actively using Bayesian statistics to develop new algorithms to improve and add features existing architectures.What are some interesting resources going through various aspects and methods of Bayesian neural networks? I have really enjoyed Yarin Gal's blog, and would love more to see more resources in that style. I am also very interested in good resources on TF Probability. Any suggestions would be highly appreciated!TL;DR: What are the best resources, i.e. papers, Medium posts, tutorials, GitHub repos, etc., to get started with Bayesian neural networks? In particular I would be very interested articles or posts comparing various methods and approaches to BNN, or commented code using TensorFlow Probability for doing (approximate) Bayesian inference in neural networks.​. Caption by davinci1913. Posted By: www.eurekaking.com
0 notes
savetopnow · 7 years ago
Text
2018-03-25 07 FITNESS now
FITNESS
12 Minute Athlete
Track Star Outdoor Challenge Workout
Full Body Bar HIIT Workout
Flexibility Challenge Week 3: Combo Stretching
Full Body Sandbag HIIT Workout
Jump Rope Strength Challenge Workout
Bayesian Bodybuilding
Do you need 4 meals per day for maximum growth after all?
Bro splits optimal after all? [New study review]
Minimalism, non-responders & training frequency [podcast]
New training frequency study: 3x vs. 6x
Is aspartame safe?
Bodybuilding.com
Bodybuilding.com Media
Kris Gethin's 8-Week Hardcore Trainer FAQs!
FST-7 Big And Ripped Essentials: Upper Body
Lift As Strong As You Look
3 Bizarre Splits To Help You Break Your Plateau
Breaking Muscle
Quick Tips For CrossFit Open WOD 18.5
The Holy Trinity of Holistic Training
Subversive Fitness: Day 329 of 360
Money, Race, Gender, and Their Impact on Obesity
Subversive Fitness: Day 328 of 360
Neghar Fonooni
Be more confident in 30 seconds
The Blast Radius
Authenticity is the only way
Autonomy: A short story
Stay Soft Through the Storm
Nerd Fitness
How a single busy mom lost 100 pounds with Nerd Fitness
5 Hacks to Effortlessly Build Healthy Habits in 2018
How Kenney the Tabletop Gamer Lost 120 Pounds and Found His Voice (Literally).
The 10 Key Differences Between Weight Loss Success and Failure
How Many Calories Do You Burn While Walking?
Reddit Fitness
Thinking of working out 4 days a week instead of 3 - Lvysaur 4-4-8
Adding juggernaut method conditioning
Best cardio workout?
PPL vs Nsuns
Percentages for various rep ranges
Strength Sensei
Mayor Rango De Movimiento Para Hacerse Más Grande Y Más Fuerte
Faster Strength Gains Through Brain Boosting
Hormonal Optimization: Simple Guidelines
How To Learn: 6 Surefire Tricks to Train Your Brain
Four Things I Learned From Charles R. Poliquin
Summer Tomato
What to Do When You Stop Dieting and It Backfires
FOR THE LOVE OF FOOD: It’s OK to let your kids trick-or-treat, gut fungi are a thing, and how to reclaim your mornings
How a Tiny Habit Can Help You Push Past a Weight Loss Plateau
FOR THE LOVE OF FOOD: How to be perpetually healthy, questioning the sustainability of online meat, and what it means to be a supertaster
Could Coffee Be Preventing You From Conceiving?
T-Nation
Tip: Over-Pulling On Pull-Ups and Pulldowns
Tip: Do This Before Every Workout
Tip: Use a Band For This Glute Exercise
Tip: The Most Outlawed Exercise
Tip: Parkinson's Law and Effective Workouts
Yoga Dork
Doggie Not Down With Human Doing Yoga
10 Ways To Resist As A Yogi
Reflections On Michael Stone, Mental Health And Yoga’s Cult Of Positivity
Are You Ready For The Eclipse? Warnings, Rituals And Other Fascinating Tidbits
Stop Outsourcing Your Self-Care
0 notes
fastforwardlabs · 8 years ago
Text
Thomas Wiecki on Probabilistic Programming with PyMC3
Tumblr media
A rolling regression with PyMC3: instead of the regression coefficients being constant over time (the points are daily stock prices of 2 stocks), this model assumes they follow a random-walk and can thus slowly adapt them over time to fit the data best. 
Probabilistic programming is coming of age. While normal programming languages denote procedures, probabilistic programming languages denote models and perform inference on these models. Users write code to specify a model for their data, and the languages run sampling algorithms across probability distributions to output answers with confidence rates and levels of uncertainty across a full distribution. These languages, in turn, open up a whole range of analytical possibilities that have historically been too hard to implement in commercial products.
One sector where probabilistic programming will likely have significant impact is financial services. Be it when predicting future market behavior or loan defaults, when analyzing individual credit patterns or anomalies that might indicate fraud, financial services organizations live and breathe risk. In that world, a tool that makes it easy and fast to predict future scenarios while quantifying uncertainty could have tremendous impact. That’s why Thomas Wiecki, Director of Data Science for the crowdsourced investment management firm Quantopian, is so excited about probabilistic programming and the new release of PyMC3 3.0.
We interviewed Dr. Wiecki to get his thoughts on why probabilistic programming is taking off now and why he thinks it’s important. Check out his blog, and keep reading for highlights!
A key benefit of probabilistic programming is that it makes it easier to construct and fit Bayesian inference models. You have a history working with Bayesian methods in your doctoral work on cognition and psychiatry. How did you use them?
One of the main problems in psychiatry today is that disorders like depression or schizophrenia are diagnosed based purely on subjective reporting of symptoms, not biological traits you can measure. By way of comparison, imagine if a cardiologist were to prescribe heart medication based on answers you gave in a questionnaire! Even the categories used to diagnose depression aren’t that valid, as two patients may have completely different symptoms, caused by different underlying biological mechanisms, but both fall under the broad category “depressed.” My thesis tried to change that by identifying differences in cognitive function -- rather than reported symptoms -- to diagnose psychiatric diseases. Towards that goal, we used computational models of the brain, estimated in a Bayesian framework, to try to measure cognitive function. Once we had accurate measures of cognitive function, we used machine learning to train classifiers to predict whether individuals were suffering from certain psychiatric or neurological disorders. The ultimate goal was to replace disease categories based on subjective descriptions of symptoms with objectively measurable cognitive function. This new field of research is generally known as computational psychiatry, and is starting to take root in industries like pharmaceuticals to test the efficacy of new drugs.
What exactly was Bayesian about your approach?
We mainly used it to get accurate fits of our models to behavior. Bayesian methods are especially powerful when there is hierarchical structure in data. In computational psychiatry, individual subjects either belong to a healthy group or a group with psychiatric disease. In terms of cognitive function, individuals are likely to share similarities with other members of their group. Including these groupings into a hierarchical model gave more powerful and informed estimates about individual subjects so we could make better and more confident predictions with less data.
Tumblr media
Bayesian inference provides robust means to test hypotheses by estimating how different two different groups are from one another. 
How did you go from computational psychiatry to data science at Quantopian?
I started working part-time at Quantopian during my PhD and just loved the process of building an actual product and solving really difficult applied problems. After I finished my PhD, it was an easy decision to come on full-time and lead the data science efforts there. Quantopian is a community of over 100.000 scientists, developers, students, and finance professionals interested in algorithmic trading. We provide all the tools and data necessary to build state-of-the-art trading algorithms. As a company, we try to identify the most promising algorithms and work with the authors to license them for our upcoming fund, which will launch later this year.  The authors retain the IP of their strategy and get a share of the net profits.
What’s one challenging data science problem you face at Quantopian?
Identifying the best strategies is a really interesting data science problem because people often overfit their strategies to historical data. A lot of strategies thus often look great historically but falter when actually used to trade with real money. As such, we let strategies bake in the oven a bit and accumulate out-of-sample data that the author of the strategy did not have access to, simply because it hadn’t happened yet when the strategy was conceived. We want to wait long enough to gain confidence, but not so long that strategies lose their edge. Probabilistic programming allows us to track uncertainty over time, informing us when we’ve waited long enough to have confidence that the strategy is actually viable and what level of risk we take on when investing in it.
It’s tricky to understand probabilistic programming when you first encounter it. How would you define it?
Probabilistic programming allows you to flexibly construct and fit Bayesian models in computer code. These models are generative: they relate unobservable causes to observable data, to simulate how we believe data is created in the real world. This is actually a very intuitive way to express how you think about a dataset and formulate specific questions. We start by specifying a model, something like “this data fits into a normal distribution”. Then, we run flexible estimation algorithms, like Markov Chain Monte Carlo (MCMC), to sample from the “posterior”, the distribution updated in light of our real-world data, which quantifies our belief into the most likely causes underlying the data. The key with probabilistic programming is that model construction and inference are almost completely independent. It used to be that those two were inherently tied together so you had to do a lot of math in order to fit a given model. Probabilistic programming can estimate almost any model you dream up which provides the data scientist with a lot of flexibility to iterate quickly on new models that might describe the data even better. Finally, because we operate in a Bayesian framework, the models rest on a very well thought out statistical foundation that handles uncertainty in a principled way.
Much of the math behind Bayesian inference and statistical sampling techniques like MCMC is not new, but probabilistic tooling is. Why is this taking off now?
There are mainly three reasons why probabilistic programming is more viable today than it was in the past. First is simply the increase in compute power, as these MCMC samplers are quite costly to run. Secondly, there have been theoretical advances in the sampling algorithms themselves, especially a new class called Hamiltonian Monte Carlo samplers. These are much more powerful and efficient in how they sample data, allowing us to fit highly complex models. Instead of sampling at random, Hamiltonian samplers use the gradient of the model to focus sampling on high probability areas. By contrast, older packages like BUGS could not compute gradients. Finally, the third required piece was software using automatic differentiation -- an automatic procedure to compute gradients on arbitrary models.
What are the skills required to use probabilistic programming? Can any data scientist get started today or are there prerequisites?
Probabilistic programming is like statistics for hackers. It used to be that even basic statistical modeling required a lot of fancy math. We also used to have to sacrifice the ability to really map the complexity in data to make models that were tractable, but just too simple. For example, with probabilistic programming we don’t have to do something like assume our data is normally distributed just to make our model tractable. This assumption is everywhere because it’s mathematically convenient, but no real-world data looks like this! Probabilistic programming enables us to capture these complex distributions. The required skills are the ability to code in a language like Python and a basic knowledge of probability to be able to state your model. There are also a lot of great resources out there to get started, like Bayesian Analysis with Python, Bayesian Methods for Hackers, and of course the soon-to-be-released Fast Forward Labs report!
Tumblr media
Congratulations on the new release of PyMC3! What differentiates PyMC3 from other probabilistic programming languages? What kinds of problems does it solve best? What are its limitations?
Thanks, we are really excited to finally release it, as PyMC3 has been under continuous development for the last 5 years! Stan and PyMC3 are among the current state-of-the-art probabilistic programming frameworks. The main difference is that Stan requires you to write models in a custom language, while PyMC3 models are pure Python code. This makes model specification, interaction, and deployment easier and more direct. In addition to advanced Hamiltonian Monte Carlo samplers, PyMC3 also features streaming variational inference, which allows for very fast model estimation on large data sets as we fit a distribution to the posterior, rather than trying to sample from it. In version 3.1, we plan to support more variational inference algorithms and GPUs, which will make things go even faster!
For which applications is probabilistic programming the right tool? For which is it the wrong tool?
If you only care about pure prediction accuracy, probabilistic programming is probably the wrong tool. However, if you want to gain insight into your data, probabilistic programming allows you to build causal models with high interpretability. This is especially relevant in the sciences and in regulated sectors like healthcare, where predictions have to be justified and can’t just come from a black-box. Another benefit is that because we are in a Bayesian framework, we get uncertainty in our parameters and in our predictions, which is important for areas where we make high-stakes decisions under very noisy conditions, like in finance. Also, if you have prior information about a domain you can very directly build this into the model. For example, let’s say you wanted to estimate the risk of diabetes from a dataset. There are many things we already know even without looking at the data, like that high blood sugar increases that risk dramatically -- we can build that into the model by using an informed prior, something that’s not possible with most machine learning algorithms.
Finally, hierarchical models are very powerful, but often underappreciated. A lot of data sets have an inherent hierarchical structure. For example, take individual preferences of users on a fashion website. Each individual has unique tastes, but often shares tastes with similar users. For example, people are more likely to have similar taste if they have the same sex, or are in the same age group, or live in the same city, state, or country. Such a model can leverage what it has learned from other group members and apply it back to an individual, leading to much more accurate predictions, even in the case where we might only have few data points per individual (which can lead to cold start problems in collaborative filtering). These hierarchies exist everywhere but are all too rarely taken into account properly. Probabilistic programming is the perfect framework to construct and fit hierarchical models.
Interpretability is certainly an issue with deep neural nets, which also require far more data than Bayesian models to train. Do you think Bayesian methods will be important for the future of deep learning?
Yes, and it’s a very exciting area! As we’re able to specify and estimate deep nets or other machine learning methods in probabilistic programming, it could really become a lingua franca that removes the barrier between statistics and machine learning, giving a common tool to do both. One thing that’s great about PyMC3 is that the underlying library is Theano, which was originally developed for deep learning. Theano helps bridge these two areas, combining the power nets have to extract latent representations out of high-dimensional data with variational inference algorithms to estimate models in a Bayesian framework. Bayesian deep learning is hot right now, so much so that NIPS offered a day-long workshop. I’ve also written about the benefits in this post and this post, explaining how Bayesian methods provide more rigor around the uncertainty and estimations of deep net predictions and provides better simulations. Finally, Bayesian Deep Learning will also allow to build exciting new architectures, like Hierarchical Bayesian Deep Networks that are useful for transfer learning. A bit like the work you did to get stronger results from Pictograph using the Wordnet hierarchy.
Tumblr media
Bayesian deep nets provide greater insight into the uncertainty around predicted values at a given point. Read more here. 
What books, papers, and people have had the greatest influence on you and your career?
I love Dan Simmons’ Hyperion Cantos series, which got me hooked on science fiction. Michael Frank (my PhD advisor) and EJ Wagenmakers first introduced me to Bayesian statistics. The Stan guys, who developed the NUTS sampler and black-box variational inference, have had a huge influence on PyMC3. They continue to push the boundaries of applied Bayesian statistics. I also really like the work coming out of the labs of David Blei and Max Welling. We hope that PyMC3 will also be an influential tool on the productivity and capabilities on data scientists across the world.
How do you think data and AI will change the financial services industry over the next few years? What should all hedge fund managers know?
I think it’s already had a big impact on finance! And as the mountains of data continue to grow, so will the advantage computers have over humans in their ability to combine and extract information out of that data. Data scientists, with their ability to pull that data together and build the predictive models will be the center of attention. That is really at the core of what we’re doing at Quantopian. We believe that by giving people everywhere on earth a platform that’s state-of-the-art for free we can find that talent before anyone else can.
6 notes · View notes